Medicine Advances (Dec 2023)
Research on a data mining algorithm based on BERTopic for medication rules in Traditional Chinese Medicine prescriptions
Abstract
Abstract Background A data mining algorithm is proposed based on BERTopic to provide new insights into the analysis of medication rules in Traditional Chinese Medicine (TCM) prescriptions. Methods Using the BERTopic algorithm, collected TCM prescriptions for corneal diseases are converted to embeddings through a transformer based on the Bidirectional Encoder Representations from Transformers pre‐trained model. Then, Uniform Manifold Approximation and Projection is applied to perform dimensionality reduction in prescription embeddings. Subsequently, Hierarchical Density‐Based Spatial Clustering of Applications with Noise is used for clustering. Finally, class‐based term frequency–inverse document frequency is used to generate several main drug combinations from the clustered results. Results The highest frequency of drugs used included Buddleja officinalis, Bidens pilosa, Angelica sinensis, Eriocaulon buergerianum, and Raw Rehmannia glutinosa. The most frequent drug combinations were “Eriocaulon buergerianum, Raw Rehmannia glutinosa, Prunella vulgaris, Notopterygium incisum” “Lycii Fructus, Bidens pilosa, Buddleja officinalis” and “Kochiae Fructus, Cortex Dictamni.” Conclusions The proposed data mining algorithm based on BERTopic demonstrated promising outcomes in the analysis of TCM prescription medication rules. This method exhibited simplicity and efficiency, thereby offering a novel avenue for analysis.
Keywords