Machine Learning-Based Network Anomaly Detection: Design, Implementation, and Evaluation
Pilar Schummer,
Alberto del Rio,
Javier Serrano,
David Jimenez,
Guillermo Sánchez,
Álvaro Llorente
Affiliations
Pilar Schummer
Escuela Técnica Superior de Ingenieros de Telecomunicación (ETSIT), Universidad Politécnica de Madrid, 28040 Madrid, Spain
Alberto del Rio
Signals, Systems and Radiocommunications Department, Escuela Técnica Superior de Ingenieros de Telecomunicación (ETSIT), Universidad Politécnica de Madrid, 28040 Madrid, Spain
Javier Serrano
Informatic Systems Department, Escuela Técnica Superior de Ingeniería de Sistemas Informáticos (ETSISI), Universidad Politécnica de Madrid, 28031 Madrid, Spain
David Jimenez
Physical Electronics, Electrical Engineering and Applied Physics Department, Escuela Técnica Superior de Ingenieros de Telecomunicación (ETSIT), Universidad Politécnica de Madrid, 28040 Madrid, Spain
Guillermo Sánchez
Global CTIO Unit, Telefónica Innovación Digital, 28050 Madrid, Spain
Álvaro Llorente
Signals, Systems and Radiocommunications Department, Escuela Técnica Superior de Ingenieros de Telecomunicación (ETSIT), Universidad Politécnica de Madrid, 28040 Madrid, Spain
Background: In the last decade, numerous methods have been proposed to define and detect outliers, particularly in complex environments like networks, where anomalies significantly deviate from normal patterns. Although defining a clear standard is challenging, anomaly detection systems have become essential for network administrators to efficiently identify and resolve irregularities. Methods: This study develops and evaluates a machine learning-based system for network anomaly detection, focusing on point anomalies within network traffic. It employs both unsupervised and supervised learning techniques, including change point detection, clustering, and classification models, to identify anomalies. SHAP values are utilized to enhance model interpretability. Results: Unsupervised models effectively captured temporal patterns, while supervised models, particularly Random Forest (94.3%), demonstrated high accuracy in classifying anomalies, closely approximating the actual anomaly rate. Conclusions: Experimental results indicate that the system can accurately predict network anomalies in advance. Congestion and packet loss were identified as key factors in anomaly detection. This study demonstrates the potential for real-world deployment of the anomaly detection system to validate its scalability.