Abstract
Accurately locating invisible keypoints in complex and occluded scenes is a critical challenge for human pose estimation (HPE). This challenge not only limits the effectiveness of existing HPE methods but also underscores the necessity for more robust and precise methodologies. To address these limitations and significantly enhance the performance and applicability of HPE systems in real-world scenarios, we propose PoseDucer, an innovative two-stage model. In the first stage, PoseDucer extracts rich visual features based on an off-the-shelf backbone network, while the second stage explores reliable spatial structural relationships. Specifically, the second stage consists of two branches. Each branch implicitly models human structural relations from a different perspective. The network is guided to adaptively reconstruct invisible keypoints using high-confidence visible keypoints and part-related structural cues. Extensive experiments on the COCO, CrowdPose, and OCHuman datasets demonstrated that PoseDucer achieved state-of-the-art performance. On the particularly challenging OCHuman dataset, it achieved an AP of 68.3%, significantly surpassing that of the existing methods. Qualitative analysis of heavily occluded examples further confirmed that PoseDucer reconstructed invisible keypoints more logically and accurately than the current methods. The visualization results also revealed the model's interpretability in correcting the inaccurate initial estimates of these keypoints.
Original language | English |
---|---|
Article number | 130328 |
Journal | Neurocomputing |
Volume | 640 |
DOIs | |
State | Published - 2025/08/01 |
Keywords
- 2D pose estimation
- Human structure modeling
- Occlusion
- PoseTransformer
ASJC Scopus subject areas
- Computer Science Applications
- Cognitive Neuroscience
- Artificial Intelligence