Journal of Non-Crystalline Solids: X (Sep 2022)
Natural language processing-guided meta-analysis and structure factor database extraction from glass literature
Abstract
Although scientific journals stand as a reliable peer-reviewed source of data, it is often too tedious to manually extract relevant information from papers. This could be attributed to the unstructured data such as images, text, captions, and non-standard reporting of data in tables. Here, using natural language processing (NLP), we introduce a corpus of around ~100,000 glass science-related research papers and 106,238 images published in them, that allow for easy navigation and query-based searching through the database. We perform a meta-analysis of the literature in the corpus employing NLP tools. Specifically, we analyze the trends in the number of publications based on countries, research areas, and journals, thereby giving a broad overview of the progress in glass science over the last six decades. Further, as a demonstration of information extraction, we extract the structure factor data of ~450 glass compositions, thereby creating the first-ever public repository on the structure factor of glasses.