Applied Sciences (Jan 2023)

Punctuation Restoration with Transformer Model on Social Media Data

  • Adebayo Mustapha Bakare,
  • Kalaiarasi Sonai Muthu Anbananthen,
  • Saravanan Muthaiyah,
  • Jayakumar Krishnan,
  • Subarmaniam Kannan

DOI
https://doi.org/10.3390/app13031685
Journal volume & issue
Vol. 13, no. 3
p. 1685

Abstract

Read online

Several key challenges are faced during sentiment analysis. One major problem is determining the sentiment of complex sentences, paragraphs, and text documents. A paragraph with multiple parts might have multiple sentiment values. Predicting the overall sentiment value for this paragraph will not produce all the information necessary for businesses and brands. Therefore, a paragraph with multiple sentences should be separated into simple sentences. With a simple sentence, it will be effective to extract all the possible sentiments. Therefore, to split a paragraph, that paragraph must be properly punctuated. Most social media texts are improperly punctuated, so separating the sentences may be challenging. This study proposes a punctuation-restoration algorithm using the transformer model approach. We evaluated different Bidirectional Encoder Representations from Transformers (BERT) models for our transformer encoding, in addition to the neural network used for evaluation. Based on our evaluation, the RobertaLarge with the bidirectional long short-term memory (LSTM) provided the best accuracy of 97% and 90% for restoring the punctuation on Amazon and Telekom data, respectively. Other evaluation criteria like precision, recall, and F1-score are also used.

Keywords