Comparison of Machine Learning Classification and Clustering Algorithms for TV Commercials Detection

Eman Abdelfattah; Shreehar Joshi

doi:10.1109/ACCESS.2023.3325888

IEEE Access (Jan 2023)

Comparison of Machine Learning Classification and Clustering Algorithms for TV Commercials Detection

Eman Abdelfattah,
Shreehar Joshi

Affiliations

Eman Abdelfattah: ORCiD; School of Computer Science and Engineering, Sacred Heart University, Fairfield, CT, USA
Shreehar Joshi: School of Theoretical and Applied Science, Ramapo College of New Jersey, Mahwah, NJ, USA

DOI: https://doi.org/10.1109/ACCESS.2023.3325888
Journal volume & issue: Vol. 11
pp. 116741 – 116751

Abstract

Read online

One of the essential aspects of broadcast monitoring is to detect and consequently extract commercial blocks in telecast news videos. The research carried out until now have based their work almost entirely on preconceived characteristics that are associated with a channel. With the advertisers constantly looking to work around the existing policies, the reliance on the nature of channels during an advertisement does not suffice. The other approach towards identifying a commercial is by frequentist approach. However, it is often the case that sponsored programs and other programs share similar time in any specified hour, rendering the frequentist approach almost useless in the process. As such, this paper uses machine learning based approach which is more generic and can employ inherent differences that commercials have over their non-commercial counterparts for classifying and clustering commercials in the news videos. The datasets which contain 90 hours of recordings from five different news channels from US, England and India have been used to train and test nine different classifiers – K Neighbors, Support Vector Machine, Decision Tree, Random Forests, Ada Boost, Gradient Boost, Gaussian NB, Linear Discriminant Analysis, and Quadratic Discriminant Analysis – and five different clustering algorithms – K Means, Agglomerative, Birch, Mini-Batch K Means, and Gaussian Mixture. Our results show that the Random Forests outperforms all the other classifiers used with respect to F1 score and median time to train and test on each of these datasets that consists of features of shots extracted from 18 hours of video. Similarly, Mini Batch K Means was found to perform the best for forming clusters of news and commercials.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords