A studyforrest extension, an annotation of spoken language in the German dubbed movie “Forrest Gump” and its audio-description [version 1; peer review: 1 approved, 2 approved with reservations]

Christian Olaf Häusler; Michael Hanke

doi:10.12688/f1000research.27621.1

F1000Research (Jan 2021)

A studyforrest extension, an annotation of spoken language in the German dubbed movie “Forrest Gump” and its audio-description [version 1; peer review: 1 approved, 2 approved with reservations]

Christian Olaf Häusler,
Michael Hanke

Affiliations

Christian Olaf Häusler: Institute of Neuroscience and Medicine, Brain & Behaviour (INM-7), Research Center Jülich, Jülich, Nordrhein-Westfalen, 52425, Germany
Michael Hanke: Institute of Neuroscience and Medicine, Brain & Behaviour (INM-7), Research Center Jülich, Jülich, Nordrhein-Westfalen, 52425, Germany

DOI: https://doi.org/10.12688/f1000research.27621.1
Journal volume & issue: Vol. 10

Abstract

Read online

Here we present an annotation of speech in the audio-visual movie “Forrest Gump” and its audio-description for a visually impaired audience, as an addition to a large public functional brain imaging dataset (studyforrest.org). The annotation provides information about the exact timing of each of the more than 2500 spoken sentences, 16,000 words (including 202 non-speech vocalizations), 66,000 phonemes, and their corresponding speaker. Additionally, for every word, we provide lemmatization, a simple part-of-speech-tagging (15 grammatical categories), a detailed part-of-speech tagging (43 grammatical categories), syntactic dependencies, and a semantic analysis based on word embedding which represents each word in a 300-dimensional semantic space. To validate the dataset’s quality, we build a model of hemodynamic brain activity based on information drawn from the annotation. Results suggest that the annotation’s content and quality enable independent researchers to create models of brain activity correlating with a variety of linguistic aspects under conditions of near-real-life complexity.

Published in F1000Research

ISSN: 2046-1402 (Online)
Publisher: F1000 Research Ltd
Country of publisher: United Kingdom
LCC subjects: Medicine; Science
Website: https://f1000research.com

About the journal