Open Research Europe (Feb 2022)

Natural language processing for aviation safety: extracting knowledge from publicly-available loss of separation reports [version 2; peer review: 2 approved]

  • Patricia Ruiz Martino,
  • Luca Oneto,
  • Irene Buselli,
  • Christian Verdonk Gallego,
  • Carlo Dambra,
  • Anthony Smoker,
  • Miguel García Martínez,
  • Tamara Pejovic,
  • Nnenna Ike

Journal volume & issue
Vol. 1

Abstract

Read online

Background: The air traffic management (ATM) system has historically coped with a global increase in traffic demand ultimately leading to increased operational complexity. When dealing with the impact of this increasing complexity on system safety it is crucial to automatically analyse the losses of separation (LoSs) using tools able to extract meaningful and actionable information from safety reports. Current research in this field mainly exploits natural language processing (NLP) to categorise the reports,with the limitations that the considered categories need to be manually annotated by experts and that general taxonomies are seldom exploited. Methods: To address the current gaps,authors propose to perform exploratory data analysis on safety reports combining state-of-the-art techniques like topic modelling and clustering and then to develop an algorithm able to extract the Toolkit for ATM Occurrence Investigation (TOKAI) taxonomy factors from the free-text safety reports based on syntactic analysis. TOKAI is a tool for investigation developed by EUROCONTROL and its taxonomy is intended to become a standard and harmonised approach to future investigations. Results: Leveraging on the LoS events reported in the public databases of the Comisión de Estudio y Análisis de Notificaciones de Incidentes de Tránsito Aéreo and the United Kingdom Airprox Board,authors show how their proposal is able to automatically extract meaningful and actionable information from safety reports,other than to classify their content according to the TOKAI taxonomy. The quality of the approach is also indirectly validated by checking the connection between the identified factors and the main contributor of the incidents. Conclusions: Authors' results are a promising first step toward the full automation of a general analysis of LoS reports supported by results on real-world data coming from two different sources. In the future,authors' proposal could be extended to other taxonomies or tailored to identify factors to be included in the safety taxonomies.

Keywords