MathVision: An Accessible Intelligent Agent for Visually Impaired People to Understand Mathematical Equations

Muhammad Awais Ahmad; Tauqir Ahmed; Muhammad Aslam; Amjad Rehman; Faten S. Alamri; Saeed Ali Bahaj; Tanzila Saba

doi:10.1109/ACCESS.2024.3514079

IEEE Access (Jan 2025)

MathVision: An Accessible Intelligent Agent for Visually Impaired People to Understand Mathematical Equations

Muhammad Awais Ahmad,
Tauqir Ahmed,
Muhammad Aslam,
Amjad Rehman,
Faten S. Alamri,
Saeed Ali Bahaj,
Tanzila Saba

Affiliations

Muhammad Awais Ahmad: Department of CS, University of Engineering and Technology, Lahore, Punjab, Pakistan
Tauqir Ahmed: Department of CS, University of Engineering and Technology, Lahore, Punjab, Pakistan
Muhammad Aslam: ORCiD; Department of CS, University of Engineering and Technology, Lahore, Punjab, Pakistan
Amjad Rehman: ORCiD; Artificial Intelligence and Data Analytics Lab, CCIS, Prince Sultan University, Riyadh, Saudi Arabia
Faten S. Alamri: ORCiD; Department of Mathematical Sciences, College of Science, Princess Nourah Bint Abdulrahman University, P.O. Box 84428, Riyadh, Saudi Arabia
Saeed Ali Bahaj: ORCiD; MIS Department College of Business Administration, Prince Sattam Bin Abdulaziz University, AlKharj, Saudi Arabia
Tanzila Saba: ORCiD; Artificial Intelligence and Data Analytics Lab, CCIS, Prince Sultan University, Riyadh, Saudi Arabia

DOI: https://doi.org/10.1109/ACCESS.2024.3514079
Journal volume & issue: Vol. 13
pp. 6155 – 6165

Abstract

Read online

2.2 billion people worldwide suffer from some form of vision impairment, according to the World Health Organization. Children with vision impairment and visual impairment may experience impaired physical, linguistic, and cognitive development, resulting in reduced levels of academic accomplishment. Many visually impaired people are working in the education sector whether they are students or teachers. Without external assistance reading of mathematical equations in images for visually impaired people is very challenging due to the complexity of notations, symbols, and variables. This paper presents a model named MathVision which converts the mathematical equation into voice. This voice is quite helpful for visually impaired people to understand mathematical equations. The proposed model is comprised of YOLOv7 object detection architecture to detect and categorize mathematical equations inside images into four distinct types: limits, trigonometry, integration, and an additional category. The input image is divided into a grid by the YOLOv7 model, and each grid cell is responsible for finding equations that fall into its respective category. bounding box coordinates, object labels, and probability scores are predicted for each equation. In the next stage, a fine-tuned DenseNet is utilized for detailed feature extraction from mathematical equation images. This involves optimizing a pre-trained DenseNet model to capture intricate patterns specific to equations. The fine-tuned DenseNet enhances overall accuracy in equation detection and categorization within the system. In the subsequent phase, an attention mechanism-based LSTM network is employed to generate natural language descriptions for mathematical equations. During the decoding process, the model is better able to focus on pertinent portions of the equation due to the integration of attention. The LSTM architecture, chosen for its effectiveness with sequential data, is trained on a dataset containing paired examples of equations and corresponding human-generated descriptions. Fine-tuning includes optimizing hyperparameters for the task, and evaluation metrics such as the BLEU score are used to assess the model’s performance in generating accurate and contextually relevant textual representations for the detected mathematical content. Our text-to-speech system takes input in the form of a natural language sentence generated by the LSTM model and converts it to the voice. This TTS using natural language processing analyzes and processes the text then it converts this processed text into speech using digital signal processing technology. A platform-independent pyttsx3 python library is used for converting text into speech. It also works offline which is the main reason for using this library in this research work. As there was no dataset available of mathematical equations with their natural language description, we created a custom dataset. We conducted real-world experiments in various visually impaired schools to see whether visually impaired students can understand mathematical equations by hearing the voice. These experiments prove that the MathVision Model is an efficient way for visually impaired students to read and write mathematical equations by listening to the voice of equations generated by proposed model.INDEX TERMS Mathematical equations, fine-tuned, YOLO v7, convolution neural network, attention mechanism, long short term memory, neural text to speech, technological development.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords