IEEE Access (Jan 2024)

WeVoTe: A Weighted Voting Technique for Automatic Sentiment Annotation of Moroccan Dialect Comments

  • Yassir Matrane,
  • Faouzia Benabbou,
  • Zouheir Banou

DOI
https://doi.org/10.1109/ACCESS.2024.3359430
Journal volume & issue
Vol. 12
pp. 16276 – 16298

Abstract

Read online

Sentiment analysis represents the systematic procedure of independently discerning polarity inherent in a textual document. A multitude of sectors can derive substantial advantages from this specialized domain. Conducting sentiment analysis (SA) involves various phases, with the initial step being the annotation process, which is often time-consuming and laborious. Within this framework, there exists a notable scarcity of existing research works. The complexity of this task becomes more difficult when analyzing texts in ‘Darija’, a form of the Moroccan dialect (MD). In our research endeavors, we introduced a novel automatic annotation methodology designed explicitly for sentiment analysis within the Moroccan dialect. A pivotal aspect of our contribution is the refinement of the stacking approach, utilizing a weighted voting technique for enhanced predictive accuracy. Our advanced method starts with the training of various neural network models across six unique MD datasets. The selection of these neural network architectures was underpinned by a comprehensive grid search procedure. Conclusively, it was discerned that models predicated on Recurrent Neural Networks (RNNs) outperformed others. Subsequent to this, we deployed an augmented stacking model, grounded in the aforementioned weighted voting technique. This model leverages the predictions generated by the neural networks as inputs. It then employs the mode of these inputs as an output, which feeds directly into a meta-classifier, which in turn produces the coefficients. These coefficients are then multiplicatively combined with the initial neural network predictions to derive the finale outputs. To evaluate the efficiency of our proposed methodology in annotating the six datasets, each dataset was isolated as a test while the remaining five served as training sets. Consequently, within the set of six datasets, the annotation results of three datasets have outperformed the established standards, attaining agreement rate percentages of 87.54% for MSAC, 91.25% for FB, 85.10% for MSDA, and 83.60% for MSTD, all of which represent new achievements in the literature.

Keywords