Scientific Reports (Mar 2025)

Research on movie rating based on BERT-base model

  • Weijun Ning,
  • Fuwei Wang,
  • Weimin Wang,
  • Haodong Wu,
  • Qihao Zhao,
  • Tianxin Zhang

DOI
https://doi.org/10.1038/s41598-025-92430-w
Journal volume & issue
Vol. 15, no. 1
pp. 1 – 14

Abstract

Read online

Abstract With the advent of the Internet, movie reviews have emerged as a crucial reference for users in selecting films and hold significant value in guiding filmmakers and platforms in content recommendation. Consequently, accurate classification of movie reviews has extensive practical applications. Traditional manual classification methods, however, are not only time-intensive and laborious but also susceptible to subjective bias. In response, automated classification techniques leveraging deep learning have become a promising alternative. Among these, the BERT model, renowned for its bidirectional encoder architecture, excels in contextual understanding and semantic representation. Nevertheless, it faces challenges in capturing long-range word dependencies and fully extracting local features in lengthy texts. Moreover, model bias stemming from sensitive information, such as gender and race, embedded in the data can compromise the fairness of classification outcomes. To address these limitations, this study introduces several enhancements to the BERT model. First, a dynamic positional offset encoding mechanism grounded in attention is employed to replace traditional absolute positional encoding, thereby enhancing the model’s capacity to process positional information. Second, a dynamic weighted fusion pooling strategy is proposed, integrating average pooling, maximum pooling, and self-attention pooling to improve the comprehensiveness of feature extraction. Additionally, during data preprocessing, sensitive attributes such as gender and race are mitigated through the removal or obfuscation of specific terms or features, combined with data augmentation techniques including easy data augmentation (EDA) and noise injection to generate neutral review samples. This approach reduces potential biases and enhances the model’s generalization capabilities. Experimental results on the IMDb movie review dataset demonstrate the efficacy of the proposed improvements, with the improved BERT model achieving a 0.73% increase in F1 score and a 0.90% improvement in accuracy, thereby validating the effectiveness of the modifications.

Keywords