VGRSS: Datasets and Models for Visual Grounding in Remote Sensing Ship Images - Details

author：

Chen, Y. (Chen, Y..) ^[1] | Zhan, L. (Zhan, L..) ^[2] | Zhao, Y. (Zhao, Y..) ^[3] | Xiong, S. (Xiong, S..) ^[4] | Lu, X. (Lu, X..) ^[5]

Indexed by：

Scopus

Abstract：

This　article　introduces　a　task　named　visual　grounding　of　remote　sensing　ship　(VGRSS)　images.　The　goal　of　VGRSS　is　to　locate　ship　objects　in　remote　sensing　images　guided　by　natural　language.　Extensive　research　has　been　conducted　on　multimodal　processing　of　remote　sensing　images　and　text　to　retrieve　rich　information　from　remote　sensing　images　using　natural　language.　However,　due　to　the　unique　characteristics　of　remote　sensing　ship　images,　ship　localization　using　natural　language　remains　a　challenge.　Therefore,　in　this　work,　we　construct　datasets　for　the　VGRSS　task　and　explore　deep　learning　models.　Specifically,　our　contributions　can　be　summarized　as　follows:　first,　we　construct　two　remote　sensing　ship　datasets　for　visual　grounding.　One　is　based　on　the　optical　remote　sensing　ship　target　detection　benchmark　dataset,　named　RSSVG,　while　the　other　is　based　on　the　synthetic　aperture　radar　(SAR)　dataset,　named　SARVG.　Second,　we　propose　a　language-guided　visual　feature　enhancement　(LVFE)　module.　This　module　enhances　visual　features　through　language　guidance　before　visual-linguistic　fusion　(VLF).　Third,　we　propose　a　VLF　module　based　on　multimodal　feature　stacking.　This　module　inputs　the　stacked　language　and　visual　features,　and　then　performs　feature　fusion　using　a　Transformer,　enabling　effective　cross-modal　interaction　and　integration.　Fourth,　we　introduce　a　novel　loss　calculation　method　by　incorporating　enhanced　intersection　over　union　(EIoU)　into　the　loss　function.　Finally,　we　benchmark　extensive　state-of-the-art　(SOTA)　natural　image　visual　grounding　(VG)　methods　on　the　constructed　RSSVG　and　SARVG　datasets,　then　provide　insightful　analysis　based　on　the　results.　This　work　offers　valuable　insights　for　developing　better　VGRSS　models.　©　1980-2012　IEEE.

Keyword：

Language-guided visual feature enhancement (LVFE) Transformer VG of remote sensing ship (VGRSS) images visual grounding (VG) dataset

Community：

[ 1 ] [Chen Y.]Wuhan University of Technology, Sanya Science and Education Innovation Park, Sanya, 572000, China
[ 2 ] [Chen Y.]Wuhan University of Technology, School of Computer Science and Artificial Intelligence, Wuhan, 430070, China
[ 3 ] [Chen Y.]Shanghai Artificial Intelligence Laboratory, Shanghai, 200232, China
[ 4 ] [Zhan L.]Wuhan University of Technology, Sanya Science and Education Innovation Park, Sanya, 572000, China
[ 5 ] [Zhan L.]Wuhan University of Technology, School of Computer Science and Artificial Intelligence, Wuhan, 430070, China
[ 6 ] [Zhan L.]Shanghai Artificial Intelligence Laboratory, Shanghai, 200232, China
[ 7 ] [Zhao Y.]Wuhan University of Technology, Sanya Science and Education Innovation Park, Sanya, 572000, China
[ 8 ] [Zhao Y.]Wuhan University of Technology, School of Computer Science and Artificial Intelligence, Wuhan, 430070, China
[ 9 ] [Zhao Y.]Shanghai Artificial Intelligence Laboratory, Shanghai, 200232, China
[ 10 ] [Xiong S.]Shanghai Artificial Intelligence Laboratory, Shanghai, 200232, China
[ 11 ] [Xiong S.]Interdisciplinary Artificial Intelligence Research Institute, Wuhan College, Wuhan, 430212, China
[ 12 ] [Lu X.]Fuzhou University, College of Physics and Information Engineering, Fuzhou, 350108, China

Reprint 's Address：

Email：

Show more details

Related Keywords：

VGRSS: Datasets and Models for Visual Grounding in Remote Sensing Ship Images
2025，IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING
Molecular characterization and property prediction based on Graph Transformer
2023，The 2023 International Conference on Display Technology, (ICDT 2023)
Analysis and Design of Fast Response Pulse Transformer
2022，3rd IEEE International Power Electronics and Application Conference and Exposition, PEAC 2022
Anomalous State Detection of Power Transformer Based on K-Means Clustering Algorithm
2021，2nd International Conference on Computing and Data Science, CONF-CDS 2021
Study on Common Mode EMI Characteristic Modeling Method of Planar Transformer
2023，2nd IEEE International Power Electronics and Application Symposium, PEAS 2023

Source ：

IEEE Transactions on Geoscience and Remote Sensing

ISSN： 0196-2892

Year： 2025

Volume： 63

7 . 5 0 0

JCR@2023

Cited Count：

WoS CC Cited Count：

SCOPUS Cited Count：

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 2

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search SCOPUS

Type
Departments

All Years Choose Year From to