Parsing AUC Result-Figures in Machine Learning Specific Scholarly Documents for Semantically-enriched Summarization

Iqra Safder; Hafsa Batool; Raheem Sarwar; Farooq Zaman; Naif Radi Aljohani; Raheel Nawaz; Mohamed Gaber; Saeed-Ul Hassan

doi:10.1080/08839514.2021.2004347

Applied Artificial Intelligence (Dec 2022)

Parsing AUC Result-Figures in Machine Learning Specific Scholarly Documents for Semantically-enriched Summarization

Iqra Safder,
Hafsa Batool,
Raheem Sarwar,
Farooq Zaman,
Naif Radi Aljohani,
Raheel Nawaz,
Mohamed Gaber,
Saeed-Ul Hassan

Affiliations

Iqra Safder: Information Technology University
Hafsa Batool: Information Technology University
Raheem Sarwar: Research Institute of Information and Language Processing, University of Wolverhampton
Farooq Zaman: Information Technology University
Naif Radi Aljohani: King Abdulaziz University
Raheel Nawaz: Manchester Metropolitan University
Mohamed Gaber: School of Computing and Digital Technology, Birmingham City University
Saeed-Ul Hassan: Management, Manchester Metropolitan University

DOI: https://doi.org/10.1080/08839514.2021.2004347
Journal volume & issue: Vol. 36, no. 1

Abstract

Read online

Machine learning specific scholarly full-text documents contain a number of result-figures expressing valuable data, including experimental results, evaluations, and cross-model comparisons. The scholarly search system often overlooks this vital information while indexing important terms using conventional text-based content extraction approaches. In this paper, we propose creating semantically enriched document summaries by extracting meaningful data from the results-figures specific to the evaluation metric of the area under the curve (AUC) and their associated captions from full-text documents. At first, classify the extracted figures and analyze them by parsing the figure text, legends, and data plots – using a convolutional neural network classification model with a pre-trained ResNet-50 on 1.2 million Images from ImageNet. Next, we extract information from the result figures specific to AUC by approximating the region under the function’s graph as a trapezoid and calculating its area, i.e., the trapezoidal rule. Using over 12,000 figures extracted from 1000 scholarly documents, we show that figure specialized summaries contain more enriched terms about figure semantics. Furthermore, we empirically show that the trapezoidal rule can calculate the area under the curve by dividing the curve into multiple intervals. Finally, we measure the quality of specialized summaries using ROUGE, Edit distance, and Jaccard Similarity metrics. Overall, we observed that figure specialized summaries are more comprehensive and semantically enriched. The applications of our research are enormous, including improved document searching, figure searching, and figure focused plagiarism. The data and code used in this paper can be accessed at the following URL: https://github.com/slab-itu/fig-ir/.

Published in Applied Artificial Intelligence

ISSN: 0883-9514 (Print); 1087-6545 (Online)
Publisher: Taylor & Francis Group
Country of publisher: United Kingdom
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science; Science: Science (General): Cybernetics
Website: https://www.tandfonline.com/journals/uaai

About the journal