LUMINA: Linguistic unified multimodal Indonesian natural audio-visual dataset
Eka Rahayu Setyaningsih,
Anik Nur Handayani,
Wahyu Sakti Gunawan Irianto,
Yosi Kristian,
Christian Trisno Sen Long Chen
Affiliations
Eka Rahayu Setyaningsih
Department of Electrical Engineering and Informatics, Universitas Negeri Malang, Semarang Street 5, Malang, 65145, East Java, Indonesia; Institut Sains dan Teknologi Terpadu Surabaya, Ngagel Jaya Tengah Street 73 – 77, Surabaya 60284, East Java, Indonesia
Anik Nur Handayani
Department of Electrical Engineering and Informatics, Universitas Negeri Malang, Semarang Street 5, Malang, 65145, East Java, Indonesia
Wahyu Sakti Gunawan Irianto
Department of Electrical Engineering and Informatics, Universitas Negeri Malang, Semarang Street 5, Malang, 65145, East Java, Indonesia
Yosi Kristian
Institut Sains dan Teknologi Terpadu Surabaya, Ngagel Jaya Tengah Street 73 – 77, Surabaya 60284, East Java, Indonesia
Christian Trisno Sen Long Chen
Institut Sains dan Teknologi Terpadu Surabaya, Ngagel Jaya Tengah Street 73 – 77, Surabaya 60284, East Java, Indonesia
The LUMINA (Linguistic Unified Multimodal Indonesian Natural Audio-Visual) Dataset is a carefully curated constrained audio-visual dataset designed to support research in the field of speech perception. Spoken exclusively in Indonesian, LUMINA contains high-quality audio-visual recordings featuring 14 native speakers, including 9 males and 5 females. Each speaker contributes approximately 1,000 sentences, producing a rich and diverse data collection. The recorded videos focus on facial recordings, capturing essential visual cues and expressions that accompany speech. This extensive dataset provides a valuable resource for understanding how humans perceive and process spoken language, paving the way for speech recognition and synthesis technology advancements.