TF-MERC: Integrating Time-Frequency Information for Multimodal Emotion Recognition in Conversation - Details

author：

Cheng, Jiawei (Cheng, Jiawei.) ^[1] | Zhu, Xiaofei (Zhu, Xiaofei.) ^[2] | Yang, Zhou (Yang, Zhou.) ^[3]

Indexed by：

EI Scopus

Abstract：

Multimodal　emotion　recognition　in　conversations　aims　to　accurately　detect　emotions　by　integrating　audio,　text,　and　video　modalities,　playing　an　important　role　in　various　systems.　Existing　approaches　utilize　convolutional　and　recurrent　networks　to　learn　short-term　emotional　information　from　individual　modalities,　or　employ　graph　and　attention　mechanisms　to　integrate　long-term　emotional　information　from　multiple　modalities.　These　methods　effectively　combine　emotional　information　within　the　conversational　content　in　the　time　domain.However,　psychological　research　shows　that　emotional　information　are　not　only　conveyed　in　the　time　domain　but　also　in　the　frequency　domain　(e.g.,　pitch　and　speech　rate).　To　capture　emotions　from　a　more　comprehensive　perspective,　we　propose　TF-MERC,　a　framework　that　integrates　both　time　and　frequency　domains.TF-MERC　uses　a　multi-domain　alignment　module　to　learn　modality　information　within　the　time　or　frequency　domains.　It　then　employs　FATransformer　to　deeply　integrate　the　multimodal　associations　between　the　time　and　frequency　domains,　providing　a　more　comprehensive　approach　for　emotion　prediction.Experimental　results　show　that　TF-MERC　outperforms　state-of-the-art　methods,　achieving　superior　performance　across　multiple　datasets.　©　2025　ACM.

Keyword：

Behavioral research Emotion Recognition Frequency domain analysis Interactive computer graphics Interactive computer systems Psychology computing Speech recognition Time domain analysis

Community：

[ 1 ] [Cheng, Jiawei]Chongqing University of Technology, Chongqing, China
[ 2 ] [Zhu, Xiaofei]Chongqing University of Technology, Chongqing, China
[ 3 ] [Yang, Zhou]Fuzhou University, Fuzhou, China

Reprint 's Address：

Email：

Show more details

Version：

TF-MERC: Integrating Time-Frequency Information for Multimodal Emotion Recognition in Conversation
2025，ICMR 2025 - Proceedings of the 2025 International Conference on Multimedia Retrieval

Related Keywords：

CTSM: Combining Trait and State Emotions for Empathetic Response Model
2024，Joint 30th International Conference on Computational Linguistics and 14th International Conference on Language Resources and Evaluation, LREC-COLING 2024
Emotion recognition by deeply learned multi-channel textual and EEG features
2021，Future Generation Computer Systems
An Emotional Conflict Recognition Approach Based on Large Language Model
2025，2024 International Symposium on AI and Cybersecurity, ISAICS 2024
Multi-View Image Tampering Detection and Localization in Real Scene Based on Spatial-Channel Attention
2024，15th International Conference on Graphics and Image Processing, ICGIP 2023

Source ：

Year： 2025

Page： 126-134

Language： English

Cited Count：

WoS CC Cited Count：

SCOPUS Cited Count：

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 0

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search Engineering Village

Type
Departments

All Years Choose Year From to