Multi-skeleton structures graph convolutional network for action quality assessment in long videos

Qing Lei; Huiying Li; Hongbo Zhang; Jixiang Du; Shangce Gao

doi:10.1007/s10489-023-04613-5

Multi-skeleton structures graph convolutional network for action quality assessment in long videos

Qing Lei, Huiying Li, Hongbo Zhang, Jixiang Du, Shangce Gao^*

^*この論文の責任著者

工学科　知能情報工学コース

研究成果: ジャーナルへの寄稿 › 学術論文 › 査読

5 被引用数 (Scopus)

抄録

In most existing action quality assessment (AQA) methods, how to score simple actions in short-term sport videos has been widely explored. Recently, a few studies have attempted to solve the AQA problem of long-duration activity by extracting dynamic or static information directly from RGB video. However, these methods may ignore specific postures defined by dynamic changes in human body joints, which makes the results inaccurate and unexplainable. In this work, we propose a novel graph convolution network based on multiple skeleton structure modelling to address the problem of effective pose feature learning to improve the performance of AQA in complex activity. Specifically, three kinds of skeleton structures, including the joints’ self-connection, the intra-part connection, and the inter-part connection, are defined to model the motion patterns of joints and body parts. Moreover, a temporal attention learning module is designed to extract temporal relations between skeleton subsequences. We evaluate the proposed method on two benchmark datasets, the MIT-skate dataset and the Rhythmic Gymnastics dataset. Extensive experiments are conducted to verify the effectiveness of the proposed method. The experimental results show that our method achieves state-of-the-art performance.

本文言語	英語
ページ（範囲）	21692-21705
ページ数	14
ジャーナル	Applied Intelligence
巻	53
号	19
DOI	https://doi.org/10.1007/s10489-023-04613-5
出版ステータス	出版済み - 2023/10

ASJC Scopus 主題領域

人工知能

文献へのアクセス

10.1007/s10489-023-04613-5

引用スタイル

@article{bc27abda23e1432f9b8cc9561ec59fd6,

title = "Multi-skeleton structures graph convolutional network for action quality assessment in long videos",

abstract = "In most existing action quality assessment (AQA) methods, how to score simple actions in short-term sport videos has been widely explored. Recently, a few studies have attempted to solve the AQA problem of long-duration activity by extracting dynamic or static information directly from RGB video. However, these methods may ignore specific postures defined by dynamic changes in human body joints, which makes the results inaccurate and unexplainable. In this work, we propose a novel graph convolution network based on multiple skeleton structure modelling to address the problem of effective pose feature learning to improve the performance of AQA in complex activity. Specifically, three kinds of skeleton structures, including the joints{\textquoteright} self-connection, the intra-part connection, and the inter-part connection, are defined to model the motion patterns of joints and body parts. Moreover, a temporal attention learning module is designed to extract temporal relations between skeleton subsequences. We evaluate the proposed method on two benchmark datasets, the MIT-skate dataset and the Rhythmic Gymnastics dataset. Extensive experiments are conducted to verify the effectiveness of the proposed method. The experimental results show that our method achieves state-of-the-art performance.",

keywords = "Action quality assessment, Graph convolutional network, Long sport videos",

author = "Qing Lei and Huiying Li and Hongbo Zhang and Jixiang Du and Shangce Gao",

note = "Publisher Copyright: {\textcopyright} 2023, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.",

year = "2023",

month = oct,

doi = "10.1007/s10489-023-04613-5",

language = "英語",

volume = "53",

pages = "21692--21705",

journal = "Applied Intelligence",

issn = "0924-669X",

publisher = "Springer Netherlands",

number = "19",

}

TY - JOUR

T1 - Multi-skeleton structures graph convolutional network for action quality assessment in long videos

AU - Lei, Qing

AU - Li, Huiying

AU - Zhang, Hongbo

AU - Du, Jixiang

AU - Gao, Shangce

PY - 2023/10

Y1 - 2023/10

N2 - In most existing action quality assessment (AQA) methods, how to score simple actions in short-term sport videos has been widely explored. Recently, a few studies have attempted to solve the AQA problem of long-duration activity by extracting dynamic or static information directly from RGB video. However, these methods may ignore specific postures defined by dynamic changes in human body joints, which makes the results inaccurate and unexplainable. In this work, we propose a novel graph convolution network based on multiple skeleton structure modelling to address the problem of effective pose feature learning to improve the performance of AQA in complex activity. Specifically, three kinds of skeleton structures, including the joints’ self-connection, the intra-part connection, and the inter-part connection, are defined to model the motion patterns of joints and body parts. Moreover, a temporal attention learning module is designed to extract temporal relations between skeleton subsequences. We evaluate the proposed method on two benchmark datasets, the MIT-skate dataset and the Rhythmic Gymnastics dataset. Extensive experiments are conducted to verify the effectiveness of the proposed method. The experimental results show that our method achieves state-of-the-art performance.

AB - In most existing action quality assessment (AQA) methods, how to score simple actions in short-term sport videos has been widely explored. Recently, a few studies have attempted to solve the AQA problem of long-duration activity by extracting dynamic or static information directly from RGB video. However, these methods may ignore specific postures defined by dynamic changes in human body joints, which makes the results inaccurate and unexplainable. In this work, we propose a novel graph convolution network based on multiple skeleton structure modelling to address the problem of effective pose feature learning to improve the performance of AQA in complex activity. Specifically, three kinds of skeleton structures, including the joints’ self-connection, the intra-part connection, and the inter-part connection, are defined to model the motion patterns of joints and body parts. Moreover, a temporal attention learning module is designed to extract temporal relations between skeleton subsequences. We evaluate the proposed method on two benchmark datasets, the MIT-skate dataset and the Rhythmic Gymnastics dataset. Extensive experiments are conducted to verify the effectiveness of the proposed method. The experimental results show that our method achieves state-of-the-art performance.

KW - Action quality assessment

KW - Graph convolutional network

KW - Long sport videos

UR - http://www.scopus.com/inward/record.url?scp=85161389494&partnerID=8YFLogxK

U2 - 10.1007/s10489-023-04613-5

DO - 10.1007/s10489-023-04613-5

M3 - 学術論文

AN - SCOPUS:85161389494

SN - 0924-669X

VL - 53

SP - 21692

EP - 21705

JO - Applied Intelligence

JF - Applied Intelligence

IS - 19

ER -

Multi-skeleton structures graph convolutional network for action quality assessment in long videos

抄録

ASJC Scopus 主題領域

文献へのアクセス

フィンガープリント

引用スタイル