• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
成果搜索

author:

Chen, Xiumei (Chen, Xiumei.) [1] | Zheng, Xiangtao (Zheng, Xiangtao.) [2] | Lu, Xiaoqiang (Lu, Xiaoqiang.) [3]

Indexed by:

Scopus SCIE

Abstract:

Remote sensing image-text retrieval (RSITR) is a cross-modal task that integrates visual and textual information, attracting significant attention in remote sensing research. Remote sensing images typically contain complex scenes with abundant details, presenting significant challenges for accurate semantic alignment between images and texts. Despite advances in the field, achieving precise alignment in such intricate contexts remains a major hurdle. To address this challenge, this article introduces a novel context-aware local-global semantic alignment (CLGSA) method. The proposed method consists of two key modules: the local key feature alignment (LKFA) module and the cross-sample global semantic alignment (CGSA) module. The LKFA module incorporates a local image masking and reconstruction task to improve the alignment between image and text features. Specifically, this module masks certain regions of the image and uses text context information to guide the reconstruction of the masked areas, enhancing the alignment of local semantics and ensuring more accurate retrieval of region-specific content. The CGSA module employs a hard sample triplet loss to improve global semantic consistency. By prioritizing difficult samples during training, this module refines feature space distributions, helping the model better capture global semantics across the entire image-text pair. A series of extensive experiments demonstrates the effectiveness of the proposed method. The method achieves an mR score of 32.07% on the RSICD dataset and 46.63% on the RSITMD dataset, outperforming baseline methods and confirming the robustness and accuracy of the approach.

Keyword:

Accuracy Cross modal retrieval Feature extraction Hard sample triplet loss Image reconstruction local image masking Remote sensing remote sensing image-text retrieval (RSITR) semantic alignment Semantics Sensors text-guided reconstruction Training Transformers Visualization

Community:

  • [ 1 ] [Chen, Xiumei]Fuzhou Univ, Coll Phys & Informat Engn, Fuzhou 350108, Peoples R China
  • [ 2 ] [Zheng, Xiangtao]Fuzhou Univ, Coll Phys & Informat Engn, Fuzhou 350108, Peoples R China
  • [ 3 ] [Lu, Xiaoqiang]Fuzhou Univ, Coll Phys & Informat Engn, Fuzhou 350108, Peoples R China

Reprint 's Address:

  • [Zheng, Xiangtao]Fuzhou Univ, Coll Phys & Informat Engn, Fuzhou 350108, Peoples R China

Show more details

Version:

Related Keywords:

Related Article:

Source :

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

ISSN: 0196-2892

Year: 2025

Volume: 63

7 . 5 0 0

JCR@2023

Cited Count:

WoS CC Cited Count:

SCOPUS Cited Count:

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 2

Online/Total:808/9705672
Address:FZU Library(No.2 Xuyuan Road, Fuzhou, Fujian, PRC Post Code:350116) Contact Us:0591-22865326
Copyright:FZU Library Technical Support:Beijing Aegean Software Co., Ltd. 闽ICP备05005463号-1