VGRSS: Datasets and Models for Visual Grounding in Remote Sensing Ship Images - Details

author：

Chen, Yaxiong (Chen, Yaxiong.) ^[1] | Zhan, Liwen (Zhan, Liwen.) ^[2] | Zhao, Yichen (Zhao, Yichen.) ^[3] | Xiong, Shengwu (Xiong, Shengwu.) ^[4] | Lu, Xiaoqiang (Lu, Xiaoqiang.) ^[5]

Indexed by：

EI Scopus SCIE

Abstract：

This　article　introduces　a　task　named　visual　grounding　of　remote　sensing　ship　(VGRSS)　images.　The　goal　of　VGRSS　is　to　locate　ship　objects　in　remote　sensing　images　guided　by　natural　language.　Extensive　research　has　been　conducted　on　multimodal　processing　of　remote　sensing　images　and　text　to　retrieve　rich　information　from　remote　sensing　images　using　natural　language.　However,　due　to　the　unique　characteristics　of　remote　sensing　ship　images,　ship　localization　using　natural　language　remains　a　challenge.　Therefore,　in　this　work,　we　construct　datasets　for　the　VGRSS　task　and　explore　deep　learning　models.　Specifically,　our　contributions　can　be　summarized　as　follows:　first,　we　construct　two　remote　sensing　ship　datasets　for　visual　grounding.　One　is　based　on　the　optical　remote　sensing　ship　target　detection　benchmark　dataset,　named　RSSVG,　while　the　other　is　based　on　the　synthetic　aperture　radar　(SAR)　dataset,　named　SARVG.　Second,　we　propose　a　language-guided　visual　feature　enhancement　(LVFE)　module.　This　module　enhances　visual　features　through　language　guidance　before　visual-linguistic　fusion　(VLF).　Third,　we　propose　a　VLF　module　based　on　multimodal　feature　stacking.　This　module　inputs　the　stacked　language　and　visual　features,　and　then　performs　feature　fusion　using　a　Transformer,　enabling　effective　cross-modal　interaction　and　integration.　Fourth,　we　introduce　a　novel　loss　calculation　method　by　incorporating　enhanced　intersection　over　union　(EIoU)　into　the　loss　function.　Finally,　we　benchmark　extensive　state-of-the-art　(SOTA)　natural　image　visual　grounding　(VG)　methods　on　the　constructed　RSSVG　and　SARVG　datasets,　then　provide　insightful　analysis　based　on　the　results.　This　work　offers　valuable　insights　for　developing　better　VGRSS　models.

Keyword：

Accuracy Artificial intelligence Benchmark testing Feature extraction Grounding Language-guided visual feature enhancement (LVFE) Linguistics Marine vehicles Remote sensing Transformer Transformers VG of remote sensing ship (VGRSS) images visual grounding (VG) dataset Visualization

Community：

[ 1 ] [Chen, Yaxiong]Wuhan Univ Technol, Sanya Sci & Educ Innovat Pk, Sanya 572000, Peoples R China
[ 2 ] [Zhan, Liwen]Wuhan Univ Technol, Sanya Sci & Educ Innovat Pk, Sanya 572000, Peoples R China
[ 3 ] [Zhao, Yichen]Wuhan Univ Technol, Sanya Sci & Educ Innovat Pk, Sanya 572000, Peoples R China
[ 4 ] [Chen, Yaxiong]Wuhan Univ Technol, Sch Comp Sci & Artificial Intelligence, Wuhan 430070, Peoples R China
[ 5 ] [Zhan, Liwen]Wuhan Univ Technol, Sch Comp Sci & Artificial Intelligence, Wuhan 430070, Peoples R China
[ 6 ] [Zhao, Yichen]Wuhan Univ Technol, Sch Comp Sci & Artificial Intelligence, Wuhan 430070, Peoples R China
[ 7 ] [Chen, Yaxiong]Shanghai Artificial Intelligence Lab, Shanghai 200232, Peoples R China
[ 8 ] [Zhan, Liwen]Shanghai Artificial Intelligence Lab, Shanghai 200232, Peoples R China
[ 9 ] [Zhao, Yichen]Shanghai Artificial Intelligence Lab, Shanghai 200232, Peoples R China
[ 10 ] [Xiong, Shengwu]Shanghai Artificial Intelligence Lab, Shanghai 200232, Peoples R China
[ 11 ] [Xiong, Shengwu]Wuhan Coll, Interdisciplinary Artificial Intelligence Res Inst, Wuhan 430212, Peoples R China
[ 12 ] [Lu, Xiaoqiang]Fuzhou Univ, Coll Phys & Informat Engn, Fuzhou 350108, Peoples R China

Reprint 's Address：

[Xiong, Shengwu]Wuhan Coll, Interdisciplinary Artificial Intelligence Res Inst, Wuhan 430212, Peoples R China

Email：

xiongsw@whut.edu.cn

Show more details

Version：

VGRSS: Datasets and Models for Visual Grounding in Remote Sensing Ship Images
2025，IEEE Transactions on Geoscience and Remote Sensing
VGRSS: Datasets and Models for Visual Grounding in Remote Sensing Ship Images
2025，IEEE Transactions on Geoscience and Remote Sensing

Related Keywords：

A Spatial and Semantic Alignment Fusion Network for SeaLand Port Segmentation
2025，IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING
Bilinear Parallel Fourier Transformer for Multimodal Remote Sensing Classification
2025，IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING
Building Type Classification Using CNN-Transformer Cross-Encoder Adaptive Learning From Very High Resolution Satellite Images
2025，IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING
Context-Aware Local-Global Semantic Alignment for Remote Sensing Image-Text Retrieval
2025，IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

Source ：

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

ISSN： 0196-2892

Year： 2025

Volume： 63

7 . 5 0 0

JCR@2023

CAS Journal Grade：1

Cited Count：

WoS CC Cited Count：

SCOPUS Cited Count：

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 11

Affiliated Colleges：

物理与信息工程学院、微电子学院本学院/部未明确归属的数据

Get Fulltext

DOI Library Discovery Baidu Scholar Search Web of Science

Type
Departments

All Years Choose Year From to