PoseDucer: Implicit relation inducement for invisible keypoint reconstruction in real-world occluded scenes

Junneng Feng; Rong Zhang; Yirui Wang; Shangce Gao; Lijun Guo

doi:10.1016/j.neucom.2025.130328

PoseDucer: Implicit relation inducement for invisible keypoint reconstruction in real-world occluded scenes

Junneng Feng, Rong Zhang, Yirui Wang, Shangce Gao, Lijun Guo^*

^*Corresponding author for this work

Intellectual Information Engineering

Research output: Contribution to journal › Article › peer-review

Abstract

Accurately locating invisible keypoints in complex and occluded scenes is a critical challenge for human pose estimation (HPE). This challenge not only limits the effectiveness of existing HPE methods but also underscores the necessity for more robust and precise methodologies. To address these limitations and significantly enhance the performance and applicability of HPE systems in real-world scenarios, we propose PoseDucer, an innovative two-stage model. In the first stage, PoseDucer extracts rich visual features based on an off-the-shelf backbone network, while the second stage explores reliable spatial structural relationships. Specifically, the second stage consists of two branches. Each branch implicitly models human structural relations from a different perspective. The network is guided to adaptively reconstruct invisible keypoints using high-confidence visible keypoints and part-related structural cues. Extensive experiments on the COCO, CrowdPose, and OCHuman datasets demonstrated that PoseDucer achieved state-of-the-art performance. On the particularly challenging OCHuman dataset, it achieved an AP of 68.3%, significantly surpassing that of the existing methods. Qualitative analysis of heavily occluded examples further confirmed that PoseDucer reconstructed invisible keypoints more logically and accurately than the current methods. The visualization results also revealed the model's interpretability in correcting the inaccurate initial estimates of these keypoints.

Original language	English
Article number	130328
Journal	Neurocomputing
Volume	640
DOIs	https://doi.org/10.1016/j.neucom.2025.130328
State	Published - 2025/08/01

Keywords

2D pose estimation
Human structure modeling
Occlusion
PoseTransformer

ASJC Scopus subject areas

Computer Science Applications
Cognitive Neuroscience
Artificial Intelligence

Access to Document

10.1016/j.neucom.2025.130328

Cite this

@article{69e81a1058054d8da026de305936b4bc,

title = "PoseDucer: Implicit relation inducement for invisible keypoint reconstruction in real-world occluded scenes",

abstract = "Accurately locating invisible keypoints in complex and occluded scenes is a critical challenge for human pose estimation (HPE). This challenge not only limits the effectiveness of existing HPE methods but also underscores the necessity for more robust and precise methodologies. To address these limitations and significantly enhance the performance and applicability of HPE systems in real-world scenarios, we propose PoseDucer, an innovative two-stage model. In the first stage, PoseDucer extracts rich visual features based on an off-the-shelf backbone network, while the second stage explores reliable spatial structural relationships. Specifically, the second stage consists of two branches. Each branch implicitly models human structural relations from a different perspective. The network is guided to adaptively reconstruct invisible keypoints using high-confidence visible keypoints and part-related structural cues. Extensive experiments on the COCO, CrowdPose, and OCHuman datasets demonstrated that PoseDucer achieved state-of-the-art performance. On the particularly challenging OCHuman dataset, it achieved an AP of 68.3%, significantly surpassing that of the existing methods. Qualitative analysis of heavily occluded examples further confirmed that PoseDucer reconstructed invisible keypoints more logically and accurately than the current methods. The visualization results also revealed the model's interpretability in correcting the inaccurate initial estimates of these keypoints.",

keywords = "2D pose estimation, Human structure modeling, Occlusion, PoseTransformer",

author = "Junneng Feng and Rong Zhang and Yirui Wang and Shangce Gao and Lijun Guo",

note = "Publisher Copyright: {\textcopyright} 2025 Elsevier B.V.",

year = "2025",

month = aug,

day = "1",

doi = "10.1016/j.neucom.2025.130328",

language = "英語",

volume = "640",

journal = "Neurocomputing",

issn = "0925-2312",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - PoseDucer

T2 - Implicit relation inducement for invisible keypoint reconstruction in real-world occluded scenes

AU - Feng, Junneng

AU - Zhang, Rong

AU - Wang, Yirui

AU - Gao, Shangce

AU - Guo, Lijun

PY - 2025/8/1

Y1 - 2025/8/1

N2 - Accurately locating invisible keypoints in complex and occluded scenes is a critical challenge for human pose estimation (HPE). This challenge not only limits the effectiveness of existing HPE methods but also underscores the necessity for more robust and precise methodologies. To address these limitations and significantly enhance the performance and applicability of HPE systems in real-world scenarios, we propose PoseDucer, an innovative two-stage model. In the first stage, PoseDucer extracts rich visual features based on an off-the-shelf backbone network, while the second stage explores reliable spatial structural relationships. Specifically, the second stage consists of two branches. Each branch implicitly models human structural relations from a different perspective. The network is guided to adaptively reconstruct invisible keypoints using high-confidence visible keypoints and part-related structural cues. Extensive experiments on the COCO, CrowdPose, and OCHuman datasets demonstrated that PoseDucer achieved state-of-the-art performance. On the particularly challenging OCHuman dataset, it achieved an AP of 68.3%, significantly surpassing that of the existing methods. Qualitative analysis of heavily occluded examples further confirmed that PoseDucer reconstructed invisible keypoints more logically and accurately than the current methods. The visualization results also revealed the model's interpretability in correcting the inaccurate initial estimates of these keypoints.

AB - Accurately locating invisible keypoints in complex and occluded scenes is a critical challenge for human pose estimation (HPE). This challenge not only limits the effectiveness of existing HPE methods but also underscores the necessity for more robust and precise methodologies. To address these limitations and significantly enhance the performance and applicability of HPE systems in real-world scenarios, we propose PoseDucer, an innovative two-stage model. In the first stage, PoseDucer extracts rich visual features based on an off-the-shelf backbone network, while the second stage explores reliable spatial structural relationships. Specifically, the second stage consists of two branches. Each branch implicitly models human structural relations from a different perspective. The network is guided to adaptively reconstruct invisible keypoints using high-confidence visible keypoints and part-related structural cues. Extensive experiments on the COCO, CrowdPose, and OCHuman datasets demonstrated that PoseDucer achieved state-of-the-art performance. On the particularly challenging OCHuman dataset, it achieved an AP of 68.3%, significantly surpassing that of the existing methods. Qualitative analysis of heavily occluded examples further confirmed that PoseDucer reconstructed invisible keypoints more logically and accurately than the current methods. The visualization results also revealed the model's interpretability in correcting the inaccurate initial estimates of these keypoints.

KW - 2D pose estimation

KW - Human structure modeling

KW - Occlusion

KW - PoseTransformer

UR - http://www.scopus.com/inward/record.url?scp=105004395942&partnerID=8YFLogxK

U2 - 10.1016/j.neucom.2025.130328

DO - 10.1016/j.neucom.2025.130328

M3 - 学術論文

AN - SCOPUS:105004395942

SN - 0925-2312

VL - 640

JO - Neurocomputing

JF - Neurocomputing

M1 - 130328

ER -

PoseDucer: Implicit relation inducement for invisible keypoint reconstruction in real-world occluded scenes

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Fingerprint

Cite this