CTHPose: An Efficient and Effective CNN-Transformer Hybrid Network for Human Pose Estimation - Details

author：

Chen, D. (Chen, D..) ^[1] | Wu, L. (Wu, L..) ^[2] | Chen, Z. (Chen, Z..) ^[3] | Lin, X. (Lin, X..) ^[4]

Indexed by：

Scopus

Abstract：

Recently,　CNN-Transformer　hybrid　network　has　been　proposed　to　resolve　either　the　heavy　computational　burden　of　CNN　or　the　difficulty　encountered　during　training　the　Transformer-based　networks.　In　this　work,　we　design　an　efficient　and　effective　CNN-Transformer　hybrid　network　for　human　pose　estimation,　namely　CTHPose.　Specifically,　Polarized　CNN　Module　is　employed　to　extract　the　feature　with　plentiful　visual　semantic　clues,　which　is　beneficial　for　the　convergence　of　the　subsequent　Transformer　encoders.　Pyramid　Transformer　Module　is　utilized　to　build　the　long-term　relationship　between　human　body　parts　with　lightweight　structure　and　less　computational　complexity.　To　establish　long-term　relationship,　large　field　of　view　is　necessary　in　Transformer,　which　leads　to　a　large　computational　workload.　Hence,　instead　of　the　entire　feature　map,　we　introduced　a　reorganized　small　sliding　window　to　provide　the　required　large　field　of　view.　Finally,　Heatmap　Generator　is　designed　to　reconstruct　the　2D　heatmaps　from　the　1D　keypoint　representation,　which　balances　parameters　and　FLOPs　while　obtaining　accurate　prediction.　According　to　quantitative　comparison　experiments　with　CNN　estimators,　CTHPose　significantly　reduces　the　number　of　network　parameters　and　GFLOPs,　while　also　providing　better　detection　accuracy.　Compared　with　mainstream　pure　Transformer　networks　and　state-of-the-art　CNN-Transformer　hybrid　networks,　this　network　also　has　competitive　performance,　and　is　more　robust　to　the　clothing　pattern　interference　and　overlapping　limbs　from　the　visual　perspective.　©　2024,　The　Author(s),　under　exclusive　license　to　Springer　Nature　Singapore　Pte　Ltd.

Keyword：

Human pose estimation Long-range dependency Transformer

Community：

[ 1 ] [Chen D.]College of Physics and Information Engineering, Fuzhou University, Fuzhou, China
[ 2 ] [Wu L.]College of Physics and Information Engineering, Fuzhou University, Fuzhou, China
[ 3 ] [Chen Z.]College of Physics and Information Engineering, Fuzhou University, Fuzhou, China
[ 4 ] [Lin X.]College of Physics and Information Engineering, Fuzhou University, Fuzhou, China

Reprint 's Address：

Email：

Show more details

Related Keywords：

CTHPose: An Efficient and Effective CNN-Transformer Hybrid Network for Human Pose Estimation
2024，PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT V
Pose focus transformer meet inter-part relation
2023，EXPERT SYSTEMS WITH APPLICATIONS
RawFormer: An Efficient Vision Transformer for Low-Light RAW Image Enhancement
2022，IEEE SIGNAL PROCESSING LETTERS
Spectrum-Induced Transformer-Based Feature Learning for Multiple Change Detection in Hyperspectral Images
2024，IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING
Hyperspectral image change detection based on an improved multi-scale and spectral-wise transformer
2024，INTERNATIONAL JOURNAL OF REMOTE SENSING

Source ：

ISSN： 0302-9743

Year： 2024

Volume： 14429 LNCS

Page： 327-339

Language： English

0 . 4 0 2

JCR@2005

Cited Count：

WoS CC Cited Count：

SCOPUS Cited Count：

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 0

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search SCOPUS

Type
Departments

All Years Choose Year From to