Indexed by:
Abstract:
Multimodal emotion recognition effectively uses cross-modal information to enhance model performance. However, in practical applications, the missing modalities issue often degrades emotion recognition accuracy due to the modality gap arising from differences in information representation and semantic inconsistencies across modalities. To address this challenge, this paper introduces a specific and invariant feature learning approach(SIFL). Specifically, we employ feature extraction techniques using a self-attention mechanism for modality-specific features and leverage a denoising autoencoder for invariant feature extraction to enhance semantic richness and expressiveness. Additionally, we develop a reconstruction network to generate high-quality modality features. To further optimize the process, we design and implement multiple optimization objectives, effectively bridging the semantic gap between modalities. Experimental results on the CMU-MOSI dataset demonstrate that the proposed method surpasses current mainstream baselines and exhibits robust performance, particularly under conditions with high missing rates, validating its efficacy and versatility in handling missing modalities. © 2025 IEEE.
Keyword:
Reprint 's Address:
Email:
Version:
Source :
Year: 2025
Page: 533-537
Language: English
Cited Count:
SCOPUS Cited Count:
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 0
Affiliated Colleges: