IEEE Access (Jan 2019)
Novelty Detection in Social Media by Fusing Text and Image Into a Single Structure
Abstract
This work aims to propose an approach for detecting novelties, taking into account the temporal flow of data streams in social media. To this end, we present a completely new architecture for novelty detection. This new architecture entails three new contributions. First, we propose a new concept for novelty definition based on temporal windows. Second, we formulate an expression to determine the quality of a novelty. Third, we introduce a new approach to the fusion of heterogeneous data (image + text), using the COCO dataset and the MASK-RCNN convolutional neural network, which transforms image and text from social media into a single data format ready to be identified by machine learning algorithms. Since novelty detection is a task in which labeled samples are scarce or inexistent, unsupervised algorithms are used, and thus, the following baseline and state-of-the-art algorithms have been chosen: kNN, HBOS, FBagging, IForesting, and autoencoders. The new fusion approach is also compared to a state-of-the-art approach to outlier detection named AOM. Because of temporal particularities and the data types being fused, a new dataset was created, containing 27,494 tweets collected from Twitter. Our experiments show that data classification of social media using data fusion is superior to using only text or only images as input data.
Keywords