IEEE Access (Jan 2025)

Information Extraction Based on Line Chart for Research Paper in Chemical Science

  • Hairong Yan,
  • Shaohan Yang

DOI
https://doi.org/10.1109/ACCESS.2024.3520222
Journal volume & issue
Vol. 13
pp. 297 – 303

Abstract

Read online

In chemical science research progress, reading other researcher research paper in same type area, will speed up the generation of their own research method. Researchers want to automate the extraction of information from charts in paper, because there are detail of what element involved and how experiment environment changed in the charts. This paper designs and implements a line chart information extraction algorithm using neural networks and Hough Transform. First, a large dataset of line charts was collected and annotated to provide a foundation for neural network training. Secondly, detect the line charts in the literature and save them as separate images. Then, the Hough transform line detection algorithm was used to detect the axes, and the line charts were segmented. For each segmented part, different recognition algorithms were designed to identify various elements in the line charts, including axes, line regions, and legends. To validate the effectiveness of the algorithm, experimental tests were conducted in the field of inorganic catalysis, automatically extracting information from line charts in the literature. The experimental results show that the designed algorithm can accurately recognize various elements in line charts and effectively extract experimental data. Compared with traditional manual methods, automated extraction not only saves a considerable amount of time but also improves the accuracy and consistency of data extraction on paper fast reading. In summary, this method provides researchers with an efficient tool that accelerates the acquisition and comparison of experimental data, thereby advancing the progress of related research electronic document.

Keywords