• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
成果搜索

author:

Lai, Xinyi (Lai, Xinyi.) [1] | Ke, Xiao (Ke, Xiao.) [2] (Scholars:柯逍) | Xu, Huangbiao (Xu, Huangbiao.) [3] | Wu, Shanghui (Wu, Shanghui.) [4] | Guo, Wenzhong (Guo, Wenzhong.) [5]

Indexed by:

EI SCIE

Abstract:

Multimodal prompt learning has emerged as an effective strategy for adapting vision-language models such as CLIP to downstream tasks. However, conventional approaches typically operate at the input level, forcing learned prompts to propagate through a sequence of frozen Transformer layers. This indirect adaptation introduces cumulative geometric distortions, a limitation that we formalize as the indirect learning dilemma (ILD), leading to overfitting of the base class and reduced generalization to novel classes. To overcome this challenge, we propose the Multimodal Self-Attention Prompt (MSP) framework, which shifts adaptation into the semantic core of the model by injecting learnable prompts directly into the key and value sequences of attention blocks. This direct modulation preserves the pretrained embedding geometry while enabling more precise downstream adaptation. MSP further incorporates distance-aware optimization to maintain semantic consistency with CLIP's original representation space, and partial prompt learning via stochastic dimension masking to improve robustness and prevent over-specialization. Extensive evaluations across 11 benchmarks demonstrate the effectiveness of MSP. It achieves a state-of-the-art harmonic mean accuracy of 80.67%, with 77.32% accuracy on novel classes-representing a 2.18% absolute improvement over prior methods-while requiring only 0.11M learnable parameters. Notably, MSP surpasses CLIP's zero-shot performance on 10 out of 11 datasets, establishing a new paradigm for efficient and generalizable prompt-based adaptation.

Keyword:

Adaptation models Distortion Few-shot learning Geometry image classification Optimization prompt learning Semantics Training transfer learning Transformers Tuning Vectors vision-language model Visualization

Community:

  • [ 1 ] [Lai, Xinyi]Fuzhou Univ, Coll Comp & Data Sci, Fujian Key Lab Network Comp & Intelligent Informat, Fuzhou 350116, Peoples R China
  • [ 2 ] [Ke, Xiao]Fuzhou Univ, Coll Comp & Data Sci, Fujian Key Lab Network Comp & Intelligent Informat, Fuzhou 350116, Peoples R China
  • [ 3 ] [Xu, Huangbiao]Fuzhou Univ, Coll Comp & Data Sci, Fujian Key Lab Network Comp & Intelligent Informat, Fuzhou 350116, Peoples R China
  • [ 4 ] [Wu, Shanghui]Fuzhou Univ, Coll Comp & Data Sci, Fujian Key Lab Network Comp & Intelligent Informat, Fuzhou 350116, Peoples R China
  • [ 5 ] [Guo, Wenzhong]Fuzhou Univ, Coll Comp & Data Sci, Fujian Key Lab Network Comp & Intelligent Informat, Fuzhou 350116, Peoples R China
  • [ 6 ] [Lai, Xinyi]Minist Educ, Engn Res Ctr Big Data Intelligence, Fuzhou 350116, Peoples R China
  • [ 7 ] [Ke, Xiao]Minist Educ, Engn Res Ctr Big Data Intelligence, Fuzhou 350116, Peoples R China
  • [ 8 ] [Xu, Huangbiao]Minist Educ, Engn Res Ctr Big Data Intelligence, Fuzhou 350116, Peoples R China
  • [ 9 ] [Wu, Shanghui]Minist Educ, Engn Res Ctr Big Data Intelligence, Fuzhou 350116, Peoples R China
  • [ 10 ] [Guo, Wenzhong]Minist Educ, Engn Res Ctr Big Data Intelligence, Fuzhou 350116, Peoples R China

Reprint 's Address:

  • 柯逍

    [Ke, Xiao]Fuzhou Univ, Coll Comp & Data Sci, Fujian Key Lab Network Comp & Intelligent Informat, Fuzhou 350116, Peoples R China;;[Guo, Wenzhong]Fuzhou Univ, Coll Comp & Data Sci, Fujian Key Lab Network Comp & Intelligent Informat, Fuzhou 350116, Peoples R China

Show more details

Version:

Related Keywords:

Source :

IEEE TRANSACTIONS ON IMAGE PROCESSING

ISSN: 1057-7149

Year: 2025

Volume: 34

Page: 5978-5988

1 0 . 8 0 0

JCR@2023

Cited Count:

WoS CC Cited Count:

SCOPUS Cited Count:

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 0

Online/Total:758/13855253
Address:FZU Library(No.2 Xuyuan Road, Fuzhou, Fujian, PRC Post Code:350116) Contact Us:0591-22865326
Copyright:FZU Library Technical Support:Beijing Aegean Software Co., Ltd. 闽ICP备05005463号-1