Journal of Open Humanities Data (Jan 2024)

When Text and Speech are Not Enough: A Multimodal Dataset of Collaboration in a Situated Task

  • Ibrahim Khebour,
  • Richard Brutti,
  • Indrani Dey,
  • Rachel Dickler,
  • Kelsey Sikes,
  • Kenneth Lai,
  • Mariah Bradford,
  • Brittany Cates,
  • Paige Hansen,
  • Changsoo Jung,
  • Brett Wisniewski,
  • Corbyn Terpstra,
  • Leanne Hirshfield,
  • Sadhana Puntambekar,
  • Nathaniel Blanchard,
  • James Pustejovsky,
  • Nikhil Krishnaswamy

DOI
https://doi.org/10.5334/johd.168
Journal volume & issue
Vol. 10
pp. 7 – 7

Abstract

Read online

To adequately model information exchanged in real human-human interactions, considering speech or text alone leaves out many critical modalities. The channels contributing to the “making of sense” in human-human interactions include but are not limited to gesture, speech, user-interaction modeling, gaze, joint attention, and involvement/engagement, all of which need to be adequately modeled to automatically extract correct and meaningful information. In this paper, we present a multimodal dataset of a novel situated and shared collaborative task, with the above channels annotated to encode these different aspects of the situated and embodied involvement of the participants in the joint activity.

Keywords