Indexed by:
Abstract:
Human pose estimation in crowded scenes is a challenging task. Due to overlap and occlusion, it is difficult to infer pose clues from individual keypoints. We proposed PFFormer, a new transformer-based approach that treats pose estimation as a hierarchical set prediction problem that first focuses on human windows and coarsely predicts whole-body poses globally within them. In PFFormer, we designed a Windows Clustering Transformer (WCT), which reorganizes the image windows by filtering the attentive windows and fusing the inattentive ones, allowing the transformer to focus on the important regions while reducing the interference from the complex background, followed by compensating for the loss of information with a global transformer. Then we partition the learned body pose into a set of structural parts and perform the Inter-Part Relation Module (IPRM) to capture the correlation between multiple parts. These full-body poses and component features are refined at a finer level through the Part-to-Joint Decoder (PJD). Extensive experiments show that PFFormer performs favorably against its counterpart on challenging datasets, including COCO2017, CrowdPose, and OChuman datasets. The performance of crowded scenes, in particular, demonstrates the robustness of the proposed methods to deal with occlusion. © 2023 Elsevier Ltd
Keyword:
Reprint 's Address:
Email:
Source :
Expert Systems with Applications
ISSN: 0957-4174
Year: 2024
Volume: 240
7 . 5 0 0
JCR@2023
CAS Journal Grade:2
Cited Count:
SCOPUS Cited Count:
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 0
Affiliated Colleges: