Indexed by:
Abstract:
Underwater Image Enhancement (UIE) is critical for numerous marine applications; however, existing methods often fall short in addressing severe color distortion, detail loss, and lack of semantic understanding, particularly under spatially varying degradation conditions. While Generative AI (GenAI), particularly diffusion models and multimodal large language models (MLLMs), offers new prospects for UIE, effectively leveraging their capabilities for fine-grained, semantic-aware enhancement remains a challenge. We proposed a LLaVA-based semantic feature modulation diffusion model (LSFM-Diff), which integrates multi-level semantic guidance collaboratively into the backbone network of the diffusion model. Specifically, an optimized prompt learning strategy is first employed to obtain concise, UIE-relevant textual descriptions from LLaVA. These semantics then guide the enhancement process in two key stages: (1) The windowed text-image fusion for condition refinement (WTIF-CR) module aligns and fuses textual semantics with local image features spatially, generating fine-grained external conditions that provide an initial spatially aware semantic blueprint for the diffusion model. (2) The semantic-guided deformable attention (SGDA) mechanism, leveraging a gradient-based image-text interaction to generate a semantic navigation map, guides the attention within the denoising network to focus on key semantic regions. Experiments conducted on several challenging benchmark datasets demonstrate that LSFM-Diff outperforms current state-of-the-art methods. Our work highlights the effectiveness of deep integration of multi-level semantic guidance fusion strategies in advancing GenAI-based UIE development. © 2025
Keyword:
Reprint 's Address:
Email:
Source :
Information Fusion
ISSN: 1566-2535
Year: 2026
Volume: 126
1 4 . 8 0 0
JCR@2023
Cited Count:
SCOPUS Cited Count:
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 3
Affiliated Colleges: