Data in Brief (Dec 2024)

Bangla news article dataset

  • Asif Mohammed Saad,
  • Umme Niraj Mahi,
  • Md. Shahidul Salim,
  • Sk Imran Hossain

Journal volume & issue
Vol. 57
p. 110874

Abstract

Read online

In this research, we present an updated standard Bangla dataset based on gathered Bangla news articles. In total, more than 1.9 million articles from nine Bangla news websites were gathered; the selection process was led by a number of categories, including sports, economy, politics, local news, tech, tourism, entertainment, education, health, the arts, and many more. The dataset per newspaper contains varying attributes, such as title, content, time, tags, meta, category, etc. This dataset will enable data scientists to investigate and assess theories related to Bangla natural language processing. Furthermore, there is a greater chance that the dataset will be utilized for domain-specific large language models in the context of Bangladesh, and it may be used to develop deep learning and machine learning models that categorize articles according to subjects.

Keywords