BaitBuster-Bangla: A comprehensive dataset for clickbait detection in Bangla with multi-feature and multi-modal analysis

Abdullah Al Imran; Md Sakib Hossain Shovon; M.F. Mridha

Data in Brief (Apr 2024)

BaitBuster-Bangla: A comprehensive dataset for clickbait detection in Bangla with multi-feature and multi-modal analysis

Abdullah Al Imran,
Md Sakib Hossain Shovon,
M.F. Mridha

Affiliations

Abdullah Al Imran: Department of Computer Science, American International University-Bangladesh, Dhaka, Bangladesh
Md Sakib Hossain Shovon: Department of Computer Science, American International University-Bangladesh, Dhaka, Bangladesh
M.F. Mridha: Corresponding author.; Department of Computer Science, American International University-Bangladesh, Dhaka, Bangladesh

Journal volume & issue: Vol. 53
p. 110239

Abstract

Read online

This study presents a large multi-modal Bangla YouTube clickbait dataset consisting of 253,070 data points collected through an automated process using the YouTube API and Python web automation frameworks. The dataset contains 18 diverse features categorized into metadata, primary content, engagement statistics, and labels for individual videos from 58 Bangla YouTube channels. A rigorous preprocessing step has been applied to denoise, deduplicate, and remove bias from the features, ensuring unbiased and reliable analysis. As the largest and most robust clickbait corpus in Bangla to date, this dataset provides significant value for natural language processing and data science researchers seeking to advance modeling of clickbait phenomena in low-resource languages. Its multi-modal nature allows for comprehensive analyses of clickbait across content, user interactions, and linguistic dimensions to develop more sophisticated detection methods with cross-linguistic applications.

Published in Data in Brief

ISSN: 2352-3409 (Online)
Publisher: Elsevier
Country of publisher: United States
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Science (General)
Website: http://www.journals.elsevier.com/data-in-brief/

About the journal

Abstract

Keywords