Novelty Detection in Social Media by Fusing Text and Image Into a Single Structure

Marta Amorim; Frederico D. Bortoloti; Patrick M. Ciarelli; Evandro O. T. Salles; Daniel C. Cavalieri

doi:10.1109/ACCESS.2019.2939736

IEEE Access (Jan 2019)

Novelty Detection in Social Media by Fusing Text and Image Into a Single Structure

Marta Amorim,
Frederico D. Bortoloti,
Patrick M. Ciarelli,
Evandro O. T. Salles,
Daniel C. Cavalieri

Affiliations

Marta Amorim: ORCiD; Electrical Engineering Department, Federal University of Espírito Santo, Vitória, Brazil
Frederico D. Bortoloti: Electrical Engineering Department, Federal University of Espírito Santo, Vitória, Brazil
Patrick M. Ciarelli: Electrical Engineering Department, Federal University of Espírito Santo, Vitória, Brazil
Evandro O. T. Salles: Electrical Engineering Department, Federal University of Espírito Santo, Vitória, Brazil
Daniel C. Cavalieri: Automation and Control Engineering Department, Federal Institute of Espírito Santo, Serra, Brazil

DOI: https://doi.org/10.1109/ACCESS.2019.2939736
Journal volume & issue: Vol. 7
pp. 132786 – 132802

Abstract

Read online

This work aims to propose an approach for detecting novelties, taking into account the temporal flow of data streams in social media. To this end, we present a completely new architecture for novelty detection. This new architecture entails three new contributions. First, we propose a new concept for novelty definition based on temporal windows. Second, we formulate an expression to determine the quality of a novelty. Third, we introduce a new approach to the fusion of heterogeneous data (image + text), using the COCO dataset and the MASK-RCNN convolutional neural network, which transforms image and text from social media into a single data format ready to be identified by machine learning algorithms. Since novelty detection is a task in which labeled samples are scarce or inexistent, unsupervised algorithms are used, and thus, the following baseline and state-of-the-art algorithms have been chosen: kNN, HBOS, FBagging, IForesting, and autoencoders. The new fusion approach is also compared to a state-of-the-art approach to outlier detection named AOM. Because of temporal particularities and the data types being fused, a new dataset was created, containing 27,494 tweets collected from Twitter. Our experiments show that data classification of social media using data fusion is superior to using only text or only images as input data.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords