EPJ Web of Conferences (Jan 2024)

General purpose data streaming platform for log analysis, anomaly detection and security protection

  • Amori Francesco,
  • Antonelli Stefano,
  • Ciaschini Vincenzo,
  • Falabella Antonio,
  • Fattibene Enrico,
  • Fornari Federico,
  • Lattanzio Daniele,
  • Michelotto Diego,
  • Morganti Lucia

DOI
https://doi.org/10.1051/epjconf/202429501032
Journal volume & issue
Vol. 295
p. 01032

Abstract

Read online

INFN-CNAF is one of the Worldwide LHC Computing Grid (WLCG) Tier-1 data centres, providing computing, networking and storage resources to a wide variety of scientific collaborations, not limited to the four LHC (Large Hadron Collider) experiments. The INFN-CNAF data centre will move to a new location next year. At the same time, the requirements from our experiments and users are becoming increasingly challenging and new scientific communities have started or will soon start exploiting our resources. Currently, we are reengineering several services, in particular our monitoring infrastructure, in order to improve the day-by-day operations and to cope with the increasing complexity of the use cases and with the future expansion of the centre. This scenario led us to implement a data streaming infrastructure designed to enable log analysis, anomaly detection, threat hunting, integrity monitoring and incident response. Such data streaming platform has been organised to manage different kinds of data coming from heterogeneous sources, to support multi-tenancy and to be scalable. Moreover, we will be able to provide an on demand end-to-end data streaming application to those users/communities requesting such kind of facility. The infrastructure is based on the Apache Kafka platform, which provides streaming of events at large scale, with authorization and authentication configured at the topic level for ensuring data isolation and protection. Data can be consumed by different applications, such as those devoted to log analysis, which provide the capability to index large amounts of data and implement appropriate access policies to inspect and visualise information. In this contribution we will present and motivate our technological choices for the definition of the infrastructure, we will describe its components and we will depict use cases which can be addressed with this platform.