Naučno-tehničeskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki (Oct 2024)
Creation and analysis of multimodal corpus for aggressive behavior recognition
Abstract
The development of digital communication systems is associated with the increasing number of disruptive behavior incidents that require rapid response in order to prevent negative consequences. Due to weak formalization of human aggression, machine learning approaches are the most suitable for this area. Machine learning approaches require representative sets of relevant data for efficient aggression recognition. Datasets developing implies such problems as dataset labels relevance to the real behavior, the consistency of the situations, where behavior is manifested, and the naturalness of behavior. The purpose of this work is the development of an aggressive behavior datasets creation methodology that reflects the key aspects of aggression and provides relevant data. The work reveals the developed methodology for creation of multimodal datasets of natural aggression behavior. The analysis of human aggression subject area substantiates the key aspects of human aggression manifestations (the presence of subject and object of aggression, the destructiveness of the aggressive action), the behavior analysis units — the time intervals of audio and video with the localized informants, defines considering types of aggression (physical and verbal overt direct aggression), substantiates criteria for aggressive behavior assessment as a set of aggressive actions that define each aggression type. The methodology consists of the following stages: collecting video on the Internet, identifying time intervals where aggression is performed, localizing informants in video frames, transcribing informants’ speech, collective labeling of physical and verbal aggression actions by a group of annotators (raters), assessing the reliability of annotations agreement using Fleiss’ kappa coefficient. In order to evaluate the methodology a new audiovisual aggressive behavior in online streams corpus (AVABOS) was collected and labeled. The dataset contains audio and video segments that contains verbal and physical aggression correspondingly that manifested by Russian-speaking informants during online video streams. The results of interrater agreement reliability show substantial agreement for physical (κ = 0.74) and moderate agreement for verbal aggression (κ = 0.48) that substantiates the developed methodology. AVABOS dataset can be used in automatic aggression recognition tasks. The developed methodology can also be used for creating datasets with the other types of behavior.
Keywords