Berkala Sainstek (Dec 2022)

Implementation of K-Means Clustering Method for Trend Analysis of Thesis Topics (Case Study: Faculty of Computer Science, University of Jember)

  • Maulana Rafael Irianto,
  • Achmad Maududie,
  • Fajrin Nurman Arifin

DOI
https://doi.org/10.19184/bst.v10i4.29524
Journal volume & issue
Vol. 10, no. 4
pp. 210 – 226

Abstract

Read online

The development of information technology causes a large number of digital documents, especially thesis documents, so that it can create opportunities for students to take the same and not varied topics. Thesis documents can be grouped by topic by identifying the abstract section. The results of the grouping can be seen with the trend with data visualization so that it can be analyzed to find out the trend of each topic. Retrieval of data in the repository of the University of Jember through a web scraping process as many as 490 thesis documents for students of the Faculty of Computer Science, University of Jember. The preprocessing stage is carried out by text mining methods which include cleaning, filtering, stemming, and tokenizing. Then calculate the weight of each word with the Term Frequency - Inverse Document Frequency algorithm, followed by the dimension reduction process using the Principal Component Analysis algorithm, which is normalized by Z-Score first. The outliers removal process is carried out before classifying documents. Furthermore, document grouping uses the K-Means Clustering method with Cosine Similarity as the distance calculation and the Silhouette Coefficient algorithm as a test. The test results were carried out with various k values and the optimal value was obtained at k = 2 with a Silhouette value of 0.80. Then the topic detection uses the Latent Dirichlet Allocation algorithm for each cluster that has been formed. Each cluster is visualized with a line chart and Trend Linear algorithm and analyzed to find out the trend. From the results of the analysis, it can be concluded that the topic of Decision Support System Development is trending down, and the topic of IT Performance Measurement and Forecasting is trending up. It can be concluded that the topic of Decision Support System Development needs to be reduced so that other topics can emerge.