Network Biology (Sep 2018)
Analysis of word occurrence frequency and word association in English text file: A big data analytics method
Abstract
In present study, I presented an algorithm for analysis of word occurrence frequency and word association in English text file. Various delimiters were used for splitting words. In addition, common used grammatical words are ignored in word occurrence and association analysis. All different words were listed according to word occurrence frequency from the greater to the smaller. Word association was detected by using one-dimensional ordered cluster analysis. The words fallen in the same class may likely have strong association. Theoretically, various classes at distinct clustering hierarchical level may represent different hierarchical topics. Java software of the algorithm was provided.