Applied Sciences (Aug 2024)
GCE: An Audio-Visual Dataset for Group Cohesion and Emotion Analysis
Abstract
We present the Group Cohesion and Emotion (GCE) dataset, which comprises 1029 segmented films sourced from YouTube. These videos encompass a range of interactions, including interviews, meetings, informal discussions, and other similar contexts. In the annotation process, graduate psychology students were tasked with assigning coherence levels, ranging from 1 to 7, and affective states as negative, neutral, or positive for each 30 s film. We introduce a foundational model that utilizes advanced visual and audio embedding techniques to investigate the concepts of group cohesion and group emotion prediction. The application of Multi-Head Attention (MHA) fusion is utilized to enhance the process of cross-representation learning. The scope of our investigation includes both unimodal and multimodal techniques, which provide insights into the prediction of group cohesion and the detection of group emotion. The results emphasize the effectiveness of the GCE dataset in examining the level of group unity and emotional conditions.
Keywords