VGRSS: Datasets and Models for Visual Grounding in Remote Sensing Ship Images - Details

author：

Chen, Yaxiong (Chen, Yaxiong.) ^[1] | Zhan, Liwen (Zhan, Liwen.) ^[2] | Zhao, Yichen (Zhao, Yichen.) ^[3] | Xiong, Shengwu (Xiong, Shengwu.) ^[4] | Lu, Xiaoqiang (Lu, Xiaoqiang.) ^[5]

Indexed by：

Abstract：

This　article　introduces　a　task　named　visual　grounding　of　remote　sensing　ship　(VGRSS)　images.　The　goal　of　VGRSS　is　to　locate　ship　objects　in　remote　sensing　images　guided　by　natural　language.　Extensive　research　has　been　conducted　on　multimodal　processing　of　remote　sensing　images　and　text　to　retrieve　rich　information　from　remote　sensing　images　using　natural　language.　However,　due　to　the　unique　characteristics　of　remote　sensing　ship　images,　ship　localization　using　natural　language　remains　a　challenge.　Therefore,　in　this　work,　we　construct　datasets　for　the　VGRSS　task　and　explore　deep　learning　models.　Specifically,　our　contributions　can　be　summarized　as　follows:　first,　we　construct　two　remote　sensing　ship　datasets　for　visual　grounding.　One　is　based　on　the　optical　remote　sensing　ship　target　detection　benchmark　dataset,　named　RSSVG,　while　the　other　is　based　on　the　synthetic　aperture　radar　(SAR)　dataset,　named　SARVG.　Second,　we　propose　a　language-guided　visual　feature　enhancement　(LVFE)　module.　This　module　enhances　visual　features　through　language　guidance　before　visual-linguistic　fusion　(VLF).　Third,　we　propose　a　VLF　module　based　on　multimodal　feature　stacking.　This　module　inputs　the　stacked　language　and　visual　features,　and　then　performs　feature　fusion　using　a　Transformer,　enabling　effective　cross-modal　interaction　and　integration.　Fourth,　we　introduce　a　novel　loss　calculation　method　by　incorporating　enhanced　intersection　over　union　(EIoU)　into　the　loss　function.　Finally,　we　benchmark　extensive　state-of-the-art　(SOTA)　natural　image　visual　grounding　(VG)　methods　on　the　constructed　RSSVG　and　SARVG　datasets,　then　provide　insightful　analysis　based　on　the　results.　This　work　offers　valuable　insights　for　developing　better　VGRSS　models.　©　1980-2012　IEEE.

Keyword：

Deep learning Image enhancement Linguistics Modeling languages Natural language processing systems Problem oriented languages Ships Visual languages

Community：

[ 1 ] [Chen, Yaxiong]Wuhan University of Technology, Sanya Science and Education Innovation Park, Sanya; 572000, China
[ 2 ] [Chen, Yaxiong]Wuhan University of Technology, School of Computer Science and Artificial Intelligence, Wuhan; 430070, China
[ 3 ] [Chen, Yaxiong]Shanghai Artificial Intelligence Laboratory, Shanghai; 200232, China
[ 4 ] [Zhan, Liwen]Wuhan University of Technology, Sanya Science and Education Innovation Park, Sanya; 572000, China
[ 5 ] [Zhan, Liwen]Wuhan University of Technology, School of Computer Science and Artificial Intelligence, Wuhan; 430070, China
[ 6 ] [Zhan, Liwen]Shanghai Artificial Intelligence Laboratory, Shanghai; 200232, China
[ 7 ] [Zhao, Yichen]Wuhan University of Technology, Sanya Science and Education Innovation Park, Sanya; 572000, China
[ 8 ] [Zhao, Yichen]Wuhan University of Technology, School of Computer Science and Artificial Intelligence, Wuhan; 430070, China
[ 9 ] [Zhao, Yichen]Shanghai Artificial Intelligence Laboratory, Shanghai; 200232, China
[ 10 ] [Xiong, Shengwu]Shanghai Artificial Intelligence Laboratory, Shanghai; 200232, China
[ 11 ] [Xiong, Shengwu]Interdisciplinary Artificial Intelligence Research Institute, Wuhan College, Wuhan; 430212, China
[ 12 ] [Lu, Xiaoqiang]Fuzhou University, College of Physics and Information Engineering, Fuzhou; 350108, China

Reprint 's Address：

[xiong, shengwu]interdisciplinary artificial intelligence research institute, wuhan college, wuhan; 430212, china;;[xiong, shengwu]shanghai artificial intelligence laboratory, shanghai; 200232, china

Email：

xiongsw@whut.edu.cn

Show more details

Related Keywords：

A Chinese Knowledge Graph Q&A System Based on Dense Relationship Retrieval
2023，3rd IEEE International Conference on Software Engineering and Artificial Intelligence, SEAI 2023
The Current Status and progress of Adversarial Examples Attacks
2021，3rd IEEE International Conference on Communications, Information System and Computer Engineering, CISCE 2021
Text Summarization Generation Based on Improved Transformer Model
2023，2023 IEEE International Conference on Dependable, Autonomic and Secure Computing, 2023 International Conference on Pervasive Intelligence and Computing, 2023 International Conference on Cloud and Big Data Computing, 2023 International Conference on Cyber Science and Technology Congress, DASC/PiCom/CBDCom/CyberSciTech 2023
Stability and Anomaly Analysis of RC Circuits under Disturbance Conditions Based on Deep Learning
2024，6th International Conference on Electronics and Communication, Network and Computer Technology, ECNCT 2024
Algorithm for Single Image Enhancement Based on Semantic Segmentation Assistance
2023，5th IEEE International Conference on Civil Aviation Safety and Information Technology, ICCASIT 2023

Source ：

IEEE Transactions on Geoscience and Remote Sensing

ISSN： 0196-2892

Year： 2025

Volume： 63

7 . 5 0 0

JCR@2023

CAS Journal Grade：1

Cited Count：

WoS CC Cited Count：

SCOPUS Cited Count：

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 4

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search Engineering Village

Type
Departments

All Years Choose Year From to