Applied Sciences (Jun 2025)
Context-Aware Multimodal Fusion with Sensor-Augmented Cross-Modal Learning: The BLAF Architecture for Robust Chinese Homophone Disambiguation in Dynamic Environments
Abstract
Chinese, a tonal language with inherent homophonic ambiguity, poses significant challenges for semantic disambiguation in natural language processing (NLP), hindering applications like speech recognition, dialog systems, and assistive technologies. Traditional static disambiguation methods suffer from poor adaptability in dynamic environments and low-frequency scenarios, limiting their real-world utility. To address these limitations, we propose BLAF—a novel MacBERT-BiLSTM Hybrid Architecture—that synergizes global semantic understanding with local sequential dependencies through dynamic multimodal feature fusion. This framework incorporates innovative mechanisms for the principled weighting of heterogeneous features, effective alignment of representations, and sensor-augmented cross-modal learning to enhance robustness, particularly in noisy environments. Employing a staged optimization strategy, BLAF achieves state-of-the-art performance on the SIGHAN 2015 (data fine-tuning and supplementation): 93.37% accuracy and 93.25% F1 score, surpassing pure BERT by 15.74% in accuracy. Ablation studies confirm the critical contributions of the integrated components. Furthermore, the sensor-augmented module significantly improves robustness under noise (speech SNR to 18.6 dB at 75 dB noise, 12.7% reduction in word error rates). By bridging gaps among tonal phonetics, contextual semantics, and computational efficiency, BLAF establishes a scalable paradigm for robust Chinese homophone disambiguation in industrial NLP applications. This work advances cognitive intelligence in Chinese NLP and provides a blueprint for adaptive disambiguation in resource-constrained and dynamic scenarios.
Keywords