IEEE Access (Jan 2024)

Enhancing User Experience on Q&A Platforms: Measuring Text Similarity Based on Hybrid CNN-LSTM Model for Efficient Duplicate Question Detection

  • Muhammad Faseeh,
  • Murad Ali Khan,
  • Naeem Iqbal,
  • Faiza Qayyum,
  • Asif Mehmood,
  • Jungsuk Kim

DOI
https://doi.org/10.1109/ACCESS.2024.3358422
Journal volume & issue
Vol. 12
pp. 34512 – 34526

Abstract

Read online

This research introduces an innovative approach for identifying duplicate questions within the Stack Overflow community, a challenging task in NLP. Leveraging deep learning techniques, our proposed methodology combines Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) networks to capture both local and long-term dependencies in textual data. We employ word embeddings, specifically Google’s Word2Vec and GloVe, to enhance text representation. Extensive experiments on the Stack Overflow dataset demonstrate the effectiveness of our approach, achieving an impressive accuracy of 87.09% and a recall rate of 87.%. The integration of CNN and LSTM models significantly streamlines preprocessing, making it a valuable tool for detecting duplicate questions. Future directions include extending the model to multiple languages and exploring alternative word embedding techniques. Our approach presents promising applications beyond Stack Overflow, offering solutions for identifying similar questions on various QA platforms.

Keywords