TY - JOUR
T1 - Skeleton-based deep pose feature learning for action quality assessment on figure skating videos
AU - Li, Huiying
AU - Lei, Qing
AU - Zhang, Hongbo
AU - Du, Jixiang
AU - Gao, Shangce
N1 - Publisher Copyright:
© 2022 Elsevier Inc.
PY - 2022/11
Y1 - 2022/11
N2 - Most of the existing Action Quality Assessment (AQA) methods for scoring sports videos have deeply researched how to evaluate the single action or several sequential-defined actions that performed in short-term sport videos, such as diving, vault, etc. They attempted to extract features directly from RGB videos through 3D ConvNets, which makes the features mixed with ambiguous scene information. To investigate the effectiveness of deep pose feature learning on automatically evaluating the complicated activities in long-duration sports videos, such as figure skating and artistic gymnastic, we propose a skeleton-based deep pose feature learning method to address this problem. For pose feature extraction, a spatial–temporal pose extraction module (STPE) is built to capture the subtle changes of human body movements and obtain the detail representations for skeletal data in space and time dimensions. For temporal information representation, an inter-action temporal relation extraction module (ATRE) is implemented by recurrent neural network to model the dynamic temporal structure of skeletal subsequences. We evaluate the proposed method on figure skating activity of MIT-skate and FIS-V datasets. The experimental results show that the proposed method is more effective than RGB video-based deep feature learning methods, including SENet and C3D. Significant performance progress has been achieved for the Spearman Rank Correlation (SRC) on MIT-Skate dataset. On FIS-V dataset, for the Total Element Score (TES) and the Program Component Score (PCS), better SRC and MSE have been achieved between the predicted scores against the judge's ones when compared with SENet and C3D feature methods.
AB - Most of the existing Action Quality Assessment (AQA) methods for scoring sports videos have deeply researched how to evaluate the single action or several sequential-defined actions that performed in short-term sport videos, such as diving, vault, etc. They attempted to extract features directly from RGB videos through 3D ConvNets, which makes the features mixed with ambiguous scene information. To investigate the effectiveness of deep pose feature learning on automatically evaluating the complicated activities in long-duration sports videos, such as figure skating and artistic gymnastic, we propose a skeleton-based deep pose feature learning method to address this problem. For pose feature extraction, a spatial–temporal pose extraction module (STPE) is built to capture the subtle changes of human body movements and obtain the detail representations for skeletal data in space and time dimensions. For temporal information representation, an inter-action temporal relation extraction module (ATRE) is implemented by recurrent neural network to model the dynamic temporal structure of skeletal subsequences. We evaluate the proposed method on figure skating activity of MIT-skate and FIS-V datasets. The experimental results show that the proposed method is more effective than RGB video-based deep feature learning methods, including SENet and C3D. Significant performance progress has been achieved for the Spearman Rank Correlation (SRC) on MIT-Skate dataset. On FIS-V dataset, for the Total Element Score (TES) and the Program Component Score (PCS), better SRC and MSE have been achieved between the predicted scores against the judge's ones when compared with SENet and C3D feature methods.
KW - Action quality assessment
KW - Action relation learning
KW - Figure skating sport videos
KW - Spatial–temporal pose feature extraction
UR - http://www.scopus.com/inward/record.url?scp=85144358823&partnerID=8YFLogxK
U2 - 10.1016/j.jvcir.2022.103625
DO - 10.1016/j.jvcir.2022.103625
M3 - 学術論文
AN - SCOPUS:85144358823
SN - 1047-3203
VL - 89
JO - Journal of Visual Communication and Image Representation
JF - Journal of Visual Communication and Image Representation
M1 - 103625
ER -