Enhancing data efficiency for autonomous vehicles: Using data sketches for detecting driving anomalies

Debbie Aisiana Indah; Judith Mwakalonge; Gurcan Comert; Saidi Siuhi

Machine Learning with Applications (Mar 2024)

Enhancing data efficiency for autonomous vehicles: Using data sketches for detecting driving anomalies

Debbie Aisiana Indah,
Judith Mwakalonge,
Gurcan Comert,
Saidi Siuhi

Affiliations

Debbie Aisiana Indah: Department of Engineering, South Carolina State University, 300 College Avenue, Orangeburg, 29117, SC, USA; Corresponding author.
Judith Mwakalonge: Department of Engineering, South Carolina State University, 300 College Avenue, Orangeburg, 29117, SC, USA
Gurcan Comert: Department of Computer Science & Engineering, Benedict College, 1600 Harden St., Columbia, 29204, SC, USA
Saidi Siuhi: Department of Engineering, South Carolina State University, 300 College Avenue, Orangeburg, 29117, SC, USA

Journal volume & issue: Vol. 15
p. 100530

Abstract

Read online

Machine learning models for near collision detection in autonomous vehicles promise enhanced predictive power. However, training on these large datasets presents storage and computational challenges, particularly when operated on conventional computing systems. This paper addresses the problem of training anomaly detection models from large-scale vehicle trajectory datasets and adopts a reservoir sampling-based data sketching technique. Predetermined subset sizes ranging from 0.4% to 100% of the original data are utilized, A single-pass reservoir sampling algorithm is then applied to construct these data subsets efficiently. Subsequently, a Support Vector Machine (SVM) model is trained on these subsets, and its performance is assessed by various metrics, including accuracy, precision, recall, and F1-score. Experimental outcomes on the HighD dataset, a comprehensive real-world collection of vehicle trajectories, confirm that our approach can achieve robust near-collision detection. With a full dataset, our model achieved an F1-score of 0.9998 for class 0 and 0.9984 for class 1. When the data was reduced to as low as 0.4% of the original size, the F1-score for class 0 remained at 0.9998 and 0.7143 for class 1. This demonstrates a capability to maintain a relatively high performance even with a 99.6% reduction in data size. Moreover, precision and recall values ranged from 71.3% to 0.999 across varying sketch sizes.

Published in Machine Learning with Applications

ISSN: 2666-8270 (Online)
Publisher: Elsevier
Country of publisher: United Kingdom
LCC subjects: Science: Science (General): Cybernetics; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.journals.elsevier.com/machine-learning-with-applications

About the journal

Abstract

Keywords