IEEE Access (Jan 2024)
Analyzing Information Leakage on Video Object Detection Datasets by Splitting Images Into Clusters With High Spatiotemporal Correlation
Abstract
Random splitting strategy is a common approach for training, testing, and validating object detection algorithms based on deep learning. Is common for datasets to have images extracted from video sources, in which there are frames with high spatial correlation, i.e., frames with rotated positions or different view angles of the same object. These highly correlated frames may lead to information leakage in training, if these frames are not well-distributed. In this work, it is shown that datasets created with highly spatial correlation frames from the same video have information leakage if using the random splitting strategy to distribute the image into the sub-datasets. It proposed a clustering dataset split algorithm in which images are distributed randomly in the sub-datasets in a pack or clusters instead of a single image at the time. The clusters are created by extracting the image features from a video of the dataset using an image-text pre-trained model, CLIP, and reducing the feature vector dimensionality with t-Distributed Stochastic Neighbor embedding (t-SNE). In this reduced dimensional representation, images are separated into clusters using a clustering algorithms like DBSCAN, OPTICS, and Agglomerative Clustering. These clusters are distributed into the train, test, and validation datasets randomly to avoiding information leakage by highly spatial correlation frames. YOLOv8 is used as the object detector algorithm to test the dataset splitting.
Keywords