Data in Brief (Jun 2024)

Bangla_MER: A unique dataset for Bangla mathematical entity recognition

  • Tanjim Taharat Aurpa,
  • Samiha Maisha Jeba,
  • Md Shoaib Ahmed,
  • Mohammad Aman Ullah,
  • Maria Mehzabin,
  • Md Musfique Anwar

Journal volume & issue
Vol. 54
p. 110407

Abstract

Read online

Mathematical entity recognition is essential for machines to define and illustrate mathematical substance faultlessly and to facilitate sufficient mathematical operations and reasoning. As mathematical entity recognition in the Bangla language is novel, to our best knowledge, there is no available dataset exists in any repository. In this paper, we present state of the art Bangla mathematical entity dataset containing 13,717 observations. Each record has a mathematical statement, mathematical type and mathematical entity. This dataset can be utilized to conduct research involving the recognition of mathematical operators, renowned mathematical terms (such as complex numbers, real numbers, prime numbers, etc.), and operands as numbers. The findings mentioned above, and their combination are also feasible with a modest tweak to the dataset. Furthermore, we have structured this dataset in raw format and made a CSV file, incorporating three columns: text, math entity, and label. As an outcome, researchers may easily handle the data, facilitating a variety of deep learning and machine learning explorations.

Keywords