Enhanced Multi-Label Question Tagging on Stack Overflow: A Two-Stage Clustering and DeBERTa-Based Approach

Isun Chehreh; Farzaneh Saadati; Ebrahim Ansari; Bahram Sadeghi Bigham

doi:10.5281/zenodo.14166315

Proceedings of the XXth Conference of Open Innovations Association FRUCT (Nov 2024)

Enhanced Multi-Label Question Tagging on Stack Overflow: A Two-Stage Clustering and DeBERTa-Based Approach

Isun Chehreh,
Farzaneh Saadati,
Ebrahim Ansari,
Bahram Sadeghi Bigham

Affiliations

Isun Chehreh: Institute for Advanced Studies in Basic Sciences
Farzaneh Saadati: University of Georgia
Ebrahim Ansari: Institute for Advanced Studies in Basic Sciences
Bahram Sadeghi Bigham: Alzahra University

DOI: https://doi.org/10.5281/zenodo.14166315
Journal volume & issue: Vol. 36, no. 2
pp. 858 – 863

Abstract

Read online

This paper introduces a novel method for automatically classifying questions with multiple labels, using data specifically sourced from Stack Overflow. Traditional tagging methods frequently face challenges due to the complexity and semantic diversity of these questions, resulting in inconsistent and sometimes inaccurate results. The process starts with preprocessing to remove any unwanted elements. Next, we convert the questions into meaningful representations using SMPNet. The semantic vectors obtained are then processed using UMAP to help us understand the overall structure of the data and make it easier to cluster similar items. After dimensionality reduction with UMAP, we use the K-Means method to group the questions into clusters, with the best number of groups determined by the Silhouette Score. Finally, a fine-tuned DeBERTa model is trained for each cluster to accurately predict the appropriate tags. Our approach significantly outperforms traditional methods, achieving 2% improvement over the best baseline. This strategy improves model efficiency by narrowing the focus to specific subsets of data.

Published in Proceedings of the XXth Conference of Open Innovations Association FRUCT

ISSN: 2305-7254 (Print); 2343-0737 (Online)
Publisher: FRUCT
Country of publisher: Finland
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Telecommunication
Website: http://fruct.org/publication

About the journal

Abstract

Keywords