Indexed by:
Abstract:
Emotion Recognition in Conversation (ERC) has emerged as a pivotal topic in the realm of human-computer interaction, drawing escalating attention. Despite previous research achieving certain accomplishments, most approaches treat each modality equally, failing to differentiate the emotional information across different modalities and thus struggling to harness the complementary and associative information within multimodal data. To address this issue, this paper propose a Cross-Modal Fusion Network with Gated Units (CFN-GU). CFN-GU comprises two main components: the Single-Modal Transformer and the Learnable Fusion Strategy With Gate (LG-Fusion). The Single-Modal Transformer is employed to model contextual information for each unimodal feature, extracting rich contextual emotional cues. Subsequently, LG-Fusion autonomously learns the weights of each feature information for every modality, thus comprehensively understanding the contributions of different modalities to emotional information. Finally, the information from the three modalities is fused based on these learned weights. CFN-GU achieves an F1 score of 64.3% on IEMOCAP, effectively improving ERC performance and outperforming all benchmark baselines. © 2024 IEEE.
Keyword:
Reprint 's Address:
Email:
Version:
Source :
Year: 2024
Page: 14-21
Language: English
Cited Count:
SCOPUS Cited Count:
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 0
Affiliated Colleges: