Advanced Science (Nov 2024)
Development and Validation of an AI‐Driven System for Automatic Literature Analysis and Molecular Regulatory Network Construction
Abstract
Abstract Decoding gene regulatory networks is essential for understanding the mechanisms underlying many complex diseases. GENET is developed, an automated system designed to extract and visualize extensive molecular relationships from published biomedical literature. Using natural language processing, entities and relations are identified from a randomly selected set of 1788 scientific articles, and visualized in a filterable knowledge graph. The performance of GENET is evaluated and compared with existing methods. The named entity recognition model has achieved an overall precision of 94.23% (4835/5131; 93.56–94.84%), recall of 97.72% (4835/4948; 97.27–98.10%), and an F1 score of 95.94%. The relation extraction model has demonstrated an overall precision of 91.63% (2593/2830; 90.55–92.59%), recall of 89.17% (2593/2908; 87.99–90.25%), and an F1 score of 90.38%. GENET significantly outperforms existing methods in extracting molecular relationships (P < 0.001). Additionally, GENET has successfully predicted WNT family member 4 regulates insulin‐like growth factor 2 via signal transducer and activator of transcription 3 in colon cancer. With RNA sequencing data and multiple immunofluorescence, the authenticity of this prediction is validated, supporting the promising feasibility of GENET.
Keywords