Indexed by:
Abstract:
Multimodal Emotion Recognition in Conversation (MER) aims to identify emotions in conversation by integrating text, audio, and visual information, playing a crucial role in dialogue system, recommendation system and healthcare. Existing methods overlook the contribution discrepancy among modalities when integrating multimodal information, leading to the problem of imbalanced optimization, which in turn results in suboptimal performance. To address this problem, we propose a Debiased Hierarchical Knowledge Distillation (DHKD) framework that enhances weak modalities’ contribution and mitigates contribution discrepancy through innovative knowledge distillation, achieving a balanced optimization. As the major contribution in our proposed model, the debiased hierarchical knowledge distillation is designed to transfer the knowledge from the multimodality to the others at both the feature and logit levels. It boosts the contribution of single modalities, especially for these modalities with weak contribution. Extensive experiments on IEMOCAP and MELD datasets demonstrate the effectiveness of our approach. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.
Keyword:
Reprint 's Address:
Email:
Source :
ISSN: 0302-9743
Year: 2025
Volume: 15859 LNCS
Page: 62-74
Language: English
0 . 4 0 2
JCR@2005
Cited Count:
SCOPUS Cited Count:
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 1
Affiliated Colleges: