Indexed by:
Abstract:
In recent years, transformers have shown strong potential in occluded person re-identification tasks, but existing methods still face key challenges: In complex occlusion scenes, the global processing mechanism of traditional Vision Transformer (ViT) is difficult to effectively distinguish between target features and occlusion noise, and the insufficient utilization of low layer features limits the robustness of the model. To this end, this paper proposes a multi-layer feature fusion framework based on dynamic token compensation and cross-layer feature interaction (DTC-CINet). Firstly, an attention-driven token screening compensation module is designed, which divides the image into target token set and background set by dynamic weight calculation, and uses multi-layer perception to transform redundant tokens to enhance the focus ability of the model on key areas. Secondly, a cross-layer feature interaction module is proposed to fuse the low-level location information and high-level semantic features by constructing an attention mapping network to improve the reasoning ability of occlusion areas. Tests reveal that the suggested method is significantly better than the current methods on the mainstream data sets, especially in the occlusion scene shows stronger robustness. © 2025 IEEE.
Keyword:
Reprint 's Address:
Email:
Source :
Year: 2025
Page: 214-222
Language: English
Cited Count:
SCOPUS Cited Count:
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 0
Affiliated Colleges: