Data in Brief (Oct 2020)

2D geometric shapes dataset – for machine learning and pattern recognition

  • Anas El Korchi,
  • Youssef Ghanou

Journal volume & issue
Vol. 32
p. 106090

Abstract

Read online

In this paper, we present a data article that describes a dataset of nine 2D geometric shapes, and each shape is drawn randomly on a 200 × 200 RGB image. During the generation of this dataset, the perimeter and the position of each shape are selected randomly and independently for each image. The rotation angle of each shape is chosen randomly for each image within an interval between -180° and 180°, as well as the background colour of each image and the filling colour of each shape are selected randomly and independently.The published dataset is composed of 9 classes of data, and each class represent a type of geometric shape (Triangle, Square, Pentagon, Hexagon, Heptagon, Octagon, Nonagon, Circle and Star). Each class is composed of 10k generated images. This paper also includes a GitHub URL to the generator source code used for the generation, which can be reused to generate any desired size of data.The proposed dataset aims to provide a perfectly clean dataset, for classification as well as clustering purposes. The fact that this dataset is generated synthetically provides the ability to use it to study the behaviour of machine learning models independently of the nature of the dataset or the possible noise or data leak that can be found in any other datasets. Moreover, the choice of a 2D geometrical shape dataset provides the ability to understand as well to have good knowledge of the number of patterns stored inside each data class.

Keywords