Data in Brief (Dec 2021)
ACHENY: A standard Chenopodiaceae image dataset for deep learning models
Abstract
This paper contains datasets related to the “Efficient Deep Learning Models for Categorizing Chenopodiaceae in the wild” (Heidary-Sharifabad et al., 2021). There are about 1500 species of Chenopodiaceae that are spread worldwide and often are ecologically important. Biodiversity conservation of these species is critical due to the destructive effects of human activities on them. For this purpose, identification and surveillance of Chenopodiaceae species in their natural habitat are necessary and can be facilitated by deep learning. The feasibility of applying deep learning algorithms to identify Chenopodiaceae species depends on access to the appropriate relevant dataset.Therefore, ACHENY dataset was collected from natural habitats of different bushes of Chenopodiaceae species, in real-world conditions from desert and semi-desert areas of the Yazd province of IRAN. This imbalanced dataset is compiled of 27,030 RGB color images from 30 Chenopodiaceae species, each species 300-1461 images. Imaging is performed from multiple bushes for each species, with different camera-to-target distances, viewpoints, angles, and natural sunlight in November and December. The collected images are not pre-processed, only are resized to 224 × 224 dimensions which can be used on some of the successful deep learning models and then were grouped into their respective class. The images in each class are separated by 10% for testing, 18% for validation, and 72% for training. Test images are often manually selected from plant bushes different from the training set. Then training and validation images are randomly separated from the remaining images in each category. The small-sized images with 64 × 64 dimensions also are included in ACHENY which can be used on some other deep models.