Indexed by:
Abstract:
The key challenge of cross-modal salient object detection lies in the representational discrepancy between different modal inputs. Existing methods typically employ only one encoding mode, either constrained encoding to extract modality-shared characteristics, or unconstrained encoding to capture modality-specific traits. However, the use of a single paradigm limits the capability of capturing salient cues, thus leading to poor generalization of existing methods. We propose a novel learning paradigm named "Collaborating Constrained and Unconstrained Encodings" (CCUE) that integrates constrained and unconstrained feature extraction to discover richer salient cues. Accordingly, we establish a CCUE network (CCUENet) consisting of a constrained branch and an unconstrained branch. The representations at each level from these two branches are integrated in an Information Selection and Fusion (ISF) module. The novelty of this module lies in its selective fusion of the important information from each feature primarily based on the response degree, which enables the network to aggregate effective cues for saliency detection. In the network training stage, we propose a Multi-scale Boundary Information (MBI) loss, which can constrain the detection results to retain clear region boundaries and boost the model's robustness to variations in object scale. Under the supervision of MBI loss, CCUENet is able to output high-quality saliency maps. The experimental results show that CCUENet exhibits superior performance on RGB-T and RGB-D datasets.
Keyword:
Reprint 's Address:
Version:
Source :
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE
ISSN: 2471-285X
Year: 2025
5 . 3 0 0
JCR@2023
Cited Count:
SCOPUS Cited Count:
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 1
Affiliated Colleges: