The Species196 dataset contains Species196-L and Species196-U. For download, refer to the download page.

  • Species196-L
    • Species-L dataset contains 19,236 labeled images of invasive species, along with bounding box annotations, involving over 160 genera and encompassing more than 60 families within the Insecta, Weeds, and Mollusk superclasses. The detailed hierarchical taxonomy information can be found here.
  • Species196-U
    • Species-U contains 1,200,000 unlabeled pairs and is a subset of LAION5B. This large-scale dataset is composed of images similar to Species196-L, which can be used to enhance model performance on Species-L through semi-supervised or self-supervised learning. The metadata is distributed under CC-BY 4.0, while the images remain under their respective copyrights.


2023/09/26 The paper is accepted by NeurIPS 2023 Datasets and Benchmarks Track.
2023/09/26 The paper link: arXiv.
2023/05/28 The data will be released on link.
2023/06/13 The data is now available.


      title={Species196: A One-Million Semi-supervised Dataset for Fine-grained Species Recognition}, 
      author={Wei He and Kai Han and Ying Nie and Chengcheng Wang and Yunhe Wang},