Heliyon (Oct 2024)
Deep transformer-based architecture for the recognition of mathematical equations from real-world math problems
Abstract
Identifying mathematical equations from real-world math problems presents a unique and challenging task within the field of Natural Language Processing (NLP). It has a wide range of applications in various areas, such as academics, digital content design, and the development of automatic or interactive learning systems. However, the accurate understanding of these equations still needs to be enhanced due to the intrinsic complexity of mathematical symbols and various structural formats. Additionally, the unique syntax, diverse symbols, and complex structure of mathematical equations present significant obstacles that traditional NLP methods and Optical Character Recognition (OCR) systems need help to overcome. In this research, we utilize deep transformer architecture to recognize mathematical equations, and we have utilized our novel dataset with 3433 distinct observations. This dataset, which we have collected to include a diverse range of mathematical equations, is used to predict six (z=x+y, z=x−y, z=xy, z=x/y, y=x! and y=x) basic mathematical equations. We applied different transformer-based architectures, such as BERT, ELECTRA, XLNet, RoBERTa, and DistilBERT, and BERT performs best with 99.80% accuracy. To the best of our knowledge, this is the first NLP work in any language where we recognize the equation from the mathematical text.