Scale-Aware Adaptive Refinement and Cross-Interaction for Remote Sensing Audio-Visual Cross-Modal Retrieval - Details

author：

Chen, Yaxiong (Chen, Yaxiong.) ^[1] | Du, Chuang (Du, Chuang.) ^[2] | Zi, Yunfei (Zi, Yunfei.) ^[3] | Xiong, Shengwu (Xiong, Shengwu.) ^[4] | Lu, Xiaoqiang (Lu, Xiaoqiang.) ^[5] (Scholars：卢孝强)

Indexed by：

EI Scopus SCIE

Abstract：

Remote　sensing　(RS)　audio-visual　cross-modal　retrieval　is　a　challenging　task　in　the　search　of　meaningful　RS　information.　Nevertheless,　the　impact　of　multiscale　features　and　associated　redundant　information　in　the　RS　images　cannot　be　overlooked　in　the　retrieval　task.　In　addition,　how　to　deal　with　the　completely　different　physical　expressions　of　different　modal　information　is　crucial　for　cross-modal　retrieval　tasks.　To　tackle　these　issues,　we　propose　a　Scale-aware　Adaptive　Refinement　and　Cross-Interaction　(SARCI)　network.　The　Quaternion-attention　Dominated　Multiscale　Visual　Refinement　(QDMVR)　module　in　SARCI　is　suggested　to　learn　multiscale　visual　features　and　further　optimize　features　containing　redundant　information　for　different　scale　features.　To　better　integrate　channel　attention　and　spatial　attention　for　adaptively　learning　of　meaningful　visual　semantics,　we　propose　the　symmetric　quaternion　attention　(SQA)　within　the　QDMVR　module　to　enhance　RS　visual　features.　The　SQA　mechanism　acts　on　both　high-level　and　low-level　features　to　explore　salient　RS　vision　information　across　different　scales.　In　order　to　allow　information　from　different　modalities　to　interact　more　valuably,　we　propose　the　Instruction-based　Cross-Learning　Module　(ICLM)　to　perform　cross-modal　feature　interaction　based　on　the　characteristic　of　the　two　modalities.　SARCI　network　demonstrates　state-of-the-art　performance　on　three　public　RS　cross-modal　datasets:　Sydney,　UCM,　and　RSICD　audio-visual　datasets.　The　code　is　available　at:　https://github.com/WUTCM-Lab/SARCI.

Keyword：

Artificial intelligence Buildings Cross-modal feature interaction Image retrieval Information retrieval remote sensing (RS) audio-visual cross-modal retrieval scale-aware adaptive refinement and cross-interaction (SARCI) Task analysis Technological innovation Visualization

Community：

[ 1 ] [Chen, Yaxiong]Wuhan Univ Technol, Sch Comp Sci & Artificial Intelligence, Wuhan 430070, Peoples R China
[ 2 ] [Du, Chuang]Wuhan Univ Technol, Sch Comp Sci & Artificial Intelligence, Wuhan 430070, Peoples R China
[ 3 ] [Zi, Yunfei]Wuhan Univ Technol, Sch Comp Sci & Artificial Intelligence, Wuhan 430070, Peoples R China
[ 4 ] [Xiong, Shengwu]Wuhan Univ Technol, Sch Comp Sci & Artificial Intelligence, Wuhan 430070, Peoples R China
[ 5 ] [Chen, Yaxiong]Wuhan Univ Technol, Sanya Sci & Educ Innovat Pk, Sanya 572000, Peoples R China
[ 6 ] [Du, Chuang]Wuhan Univ Technol, Sanya Sci & Educ Innovat Pk, Sanya 572000, Peoples R China
[ 7 ] [Zi, Yunfei]Wuhan Univ Technol, Sanya Sci & Educ Innovat Pk, Sanya 572000, Peoples R China
[ 8 ] [Xiong, Shengwu]Wuhan Univ Technol, Sanya Sci & Educ Innovat Pk, Sanya 572000, Peoples R China
[ 9 ] [Chen, Yaxiong]Shanghai Artificial Intelligence Lab, China401122, Shanghai 200232, Peoples R China
[ 10 ] [Du, Chuang]Shanghai Artificial Intelligence Lab, China401122, Shanghai 200232, Peoples R China
[ 11 ] [Zi, Yunfei]Shanghai Artificial Intelligence Lab, China401122, Shanghai 200232, Peoples R China
[ 12 ] [Xiong, Shengwu]Shanghai Artificial Intelligence Lab, China401122, Shanghai 200232, Peoples R China
[ 13 ] [Chen, Yaxiong]Wuhan Univ Technol, Chongqing Res Inst, Chongqing 401122, Peoples R China
[ 14 ] [Du, Chuang]Wuhan Univ Technol, Chongqing Res Inst, Chongqing 401122, Peoples R China
[ 15 ] [Zi, Yunfei]Wuhan Univ Technol, Chongqing Res Inst, Chongqing 401122, Peoples R China
[ 16 ] [Xiong, Shengwu]Wuhan Huaxia Inst Technol, Sch Informat Engn, Wuhan 430223, Peoples R China
[ 17 ] [Xiong, Shengwu]Qiongtai Normal Univ, Sch Informat Sci & Technol, Haikou 571127, Peoples R China
[ 18 ] [Lu, Xiaoqiang]Fuzhou Univ, Coll Phys & Informat Engn, Fuzhou 350108, Peoples R China

Reprint 's Address：

[Xiong, Shengwu]Wuhan Univ Technol, Sch Comp Sci & Artificial Intelligence, Wuhan 430070, Peoples R China;;[Xiong, Shengwu]Wuhan Univ Technol, Sanya Sci & Educ Innovat Pk, Sanya 572000, Peoples R China;;[Xiong, Shengwu]Shanghai Artificial Intelligence Lab, China401122, Shanghai 200232, Peoples R China;;

Email：

xiongsw@whut.edu.cn

Show more details

Version：

Scale-Aware Adaptive Refinement and Cross-Interaction for Remote Sensing Audio-Visual Cross-Modal Retrieval
2024，IEEE Transactions on Geoscience and Remote Sensing
Scale-aware Adaptive Refinement and Cross Interaction for Remote Sensing Audio-Visual Cross-Modal Retrieval
2024，IEEE Transactions on Geoscience and Remote Sensing

Related Keywords：

An optimizing search based on Kernel-based fuzzy C-means clustering
2009，2009 International Conference on Computational Intelligence and Software Engineering, CiSE 2009
Enhancing Creativity and Sustainable Competitive Advantage Through Data-Driven Decision-Making and Digital Leadership
2025，IEEE TRANSACTIONS ON ENGINEERING MANAGEMENT
Bow Direction Detection Based on Angular Coding With Heading Intersection Over Union Loss
2025，IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING
A Spatial and Semantic Alignment Fusion Network for SeaLand Port Segmentation
2025，IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING

Source ：

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

ISSN： 0196-2892

Year： 2024

Volume： 62

7 . 5 0 0

JCR@2023

CAS Journal Grade：1

Cited Count：

WoS CC Cited Count：

SCOPUS Cited Count：

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 0

Affiliated Colleges：

物理与信息工程学院、微电子学院本学院/部未明确归属的数据

Get Fulltext

DOI Library Discovery Baidu Scholar Search Web of Science

Type
Departments

All Years Choose Year From to