LUMINA: Linguistic unified multimodal Indonesian natural audio-visual dataset

Eka Rahayu Setyaningsih; Anik Nur Handayani; Wahyu Sakti Gunawan Irianto; Yosi Kristian; Christian Trisno Sen Long Chen

Data in Brief (Jun 2024)

LUMINA: Linguistic unified multimodal Indonesian natural audio-visual dataset

Eka Rahayu Setyaningsih,
Anik Nur Handayani,
Wahyu Sakti Gunawan Irianto,
Yosi Kristian,
Christian Trisno Sen Long Chen

Affiliations

Eka Rahayu Setyaningsih: Department of Electrical Engineering and Informatics, Universitas Negeri Malang, Semarang Street 5, Malang, 65145, East Java, Indonesia; Institut Sains dan Teknologi Terpadu Surabaya, Ngagel Jaya Tengah Street 73 – 77, Surabaya 60284, East Java, Indonesia
Anik Nur Handayani: Department of Electrical Engineering and Informatics, Universitas Negeri Malang, Semarang Street 5, Malang, 65145, East Java, Indonesia
Wahyu Sakti Gunawan Irianto: Department of Electrical Engineering and Informatics, Universitas Negeri Malang, Semarang Street 5, Malang, 65145, East Java, Indonesia
Yosi Kristian: Institut Sains dan Teknologi Terpadu Surabaya, Ngagel Jaya Tengah Street 73 – 77, Surabaya 60284, East Java, Indonesia
Christian Trisno Sen Long Chen: Institut Sains dan Teknologi Terpadu Surabaya, Ngagel Jaya Tengah Street 73 – 77, Surabaya 60284, East Java, Indonesia

Journal volume & issue: Vol. 54
p. 110279

Abstract

Read online

The LUMINA (Linguistic Unified Multimodal Indonesian Natural Audio-Visual) Dataset is a carefully curated constrained audio-visual dataset designed to support research in the field of speech perception. Spoken exclusively in Indonesian, LUMINA contains high-quality audio-visual recordings featuring 14 native speakers, including 9 males and 5 females. Each speaker contributes approximately 1,000 sentences, producing a rich and diverse data collection. The recorded videos focus on facial recordings, capturing essential visual cues and expressions that accompany speech. This extensive dataset provides a valuable resource for understanding how humans perceive and process spoken language, paving the way for speech recognition and synthesis technology advancements.

Published in Data in Brief

ISSN: 2352-3409 (Online)
Publisher: Elsevier
Country of publisher: United States
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Science (General)
Website: http://www.journals.elsevier.com/data-in-brief/

About the journal

Abstract

Keywords