Context-Aware Multimodal Fusion with Sensor-Augmented Cross-Modal Learning: The BLAF Architecture for Robust Chinese Homophone Disambiguation in Dynamic Environments

Yu Sun; Yihang Qin; Wenhao Chen; Xuan Li; Chunlian Li

doi:10.3390/app15137068

Applied Sciences (Jun 2025)

Context-Aware Multimodal Fusion with Sensor-Augmented Cross-Modal Learning: The BLAF Architecture for Robust Chinese Homophone Disambiguation in Dynamic Environments

Yu Sun,
Yihang Qin,
Wenhao Chen,
Xuan Li,
Chunlian Li

Affiliations

Yu Sun: School of Special Education, Changchun University, Changchun 130022, China
Yihang Qin: School of Computer Science and Technology, Changchun University, Changchun 130022, China
Wenhao Chen: School of Computer Science and Technology, Changchun University, Changchun 130022, China
Xuan Li: School of Computer Science and Technology, Changchun University, Changchun 130022, China
Chunlian Li: School of Computer Science and Technology, Changchun University, Changchun 130022, China

DOI: https://doi.org/10.3390/app15137068
Journal volume & issue: Vol. 15, no. 13
p. 7068

Abstract

Read online

Chinese, a tonal language with inherent homophonic ambiguity, poses significant challenges for semantic disambiguation in natural language processing (NLP), hindering applications like speech recognition, dialog systems, and assistive technologies. Traditional static disambiguation methods suffer from poor adaptability in dynamic environments and low-frequency scenarios, limiting their real-world utility. To address these limitations, we propose BLAF—a novel MacBERT-BiLSTM Hybrid Architecture—that synergizes global semantic understanding with local sequential dependencies through dynamic multimodal feature fusion. This framework incorporates innovative mechanisms for the principled weighting of heterogeneous features, effective alignment of representations, and sensor-augmented cross-modal learning to enhance robustness, particularly in noisy environments. Employing a staged optimization strategy, BLAF achieves state-of-the-art performance on the SIGHAN 2015 (data fine-tuning and supplementation): 93.37% accuracy and 93.25% F1 score, surpassing pure BERT by 15.74% in accuracy. Ablation studies confirm the critical contributions of the integrated components. Furthermore, the sensor-augmented module significantly improves robustness under noise (speech SNR to 18.6 dB at 75 dB noise, 12.7% reduction in word error rates). By bridging gaps among tonal phonetics, contextual semantics, and computational efficiency, BLAF establishes a scalable paradigm for robust Chinese homophone disambiguation in industrial NLP applications. This work advances cognitive intelligence in Chinese NLP and provides a blueprint for adaptive disambiguation in resource-constrained and dynamic scenarios.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords