Learning disentangled representations for controllable human motion prediction

Chunzhi Gu; Jun Yu; Chao Zhang

doi:10.1016/j.patcog.2023.109998

Learning disentangled representations for controllable human motion prediction

Chunzhi Gu, Jun Yu, Chao Zhang^*

^*この論文の責任著者

工学科　知能情報工学コース

研究成果: ジャーナルへの寄稿 › 学術論文 › 査読

8 被引用数 (Scopus)

抄録

Generative model-based motion prediction techniques have recently realized predicting controlled human motions, such as predicting multiple upper human body motions with similar lower-body motions. However, to achieve this, the state-of-the-art methods require either subsequently learning mapping functions to seek similar motions or training the model repetitively to enable control over the desired portion of body. In this paper, we propose a novel framework to learn disentangled representations for controllable human motion prediction. Our task is to predict multiple future human motions based on the past observed sequence, with the control of partial-body movements. Our network involves a conditional variational auto-encoder (CVAE) architecture to model full-body human motion, and an extra CVAE path to learn only the corresponding partial-body (e.g., lower-body) motion. Specifically, the inductive bias imposed by the extra CVAE path encourages two latent variables in two paths to respectively govern separate representations for each partial-body motion. With a single training, our model is able to provide two types of controls for the generated human motions: (i) strictly controlling one portion of human body and (ii) adaptively controlling the other portion, by sampling from a pair of latent spaces. Additionally, we extend and adapt a sampling strategy to our trained model to diversify the controllable predictions. Our framework also potentially allows new forms of control by flexibly customizing the input for the extra CVAE path. Extensive experimental results and ablation studies demonstrate that our approach is capable of predicting state-of-the-art controllable human motions both qualitatively and quantitatively.

本文言語	英語
論文番号	109998
ジャーナル	Pattern Recognition
巻	146
DOI	https://doi.org/10.1016/j.patcog.2023.109998
出版ステータス	出版済み - 2024/02

ASJC Scopus 主題領域

ソフトウェア
信号処理
コンピュータビジョンおよびパターン認識
人工知能

文献へのアクセス

10.1016/j.patcog.2023.109998

引用スタイル

@article{f0c1c4b29083479b947885d57d661e92,

title = "Learning disentangled representations for controllable human motion prediction",

abstract = "Generative model-based motion prediction techniques have recently realized predicting controlled human motions, such as predicting multiple upper human body motions with similar lower-body motions. However, to achieve this, the state-of-the-art methods require either subsequently learning mapping functions to seek similar motions or training the model repetitively to enable control over the desired portion of body. In this paper, we propose a novel framework to learn disentangled representations for controllable human motion prediction. Our task is to predict multiple future human motions based on the past observed sequence, with the control of partial-body movements. Our network involves a conditional variational auto-encoder (CVAE) architecture to model full-body human motion, and an extra CVAE path to learn only the corresponding partial-body (e.g., lower-body) motion. Specifically, the inductive bias imposed by the extra CVAE path encourages two latent variables in two paths to respectively govern separate representations for each partial-body motion. With a single training, our model is able to provide two types of controls for the generated human motions: (i) strictly controlling one portion of human body and (ii) adaptively controlling the other portion, by sampling from a pair of latent spaces. Additionally, we extend and adapt a sampling strategy to our trained model to diversify the controllable predictions. Our framework also potentially allows new forms of control by flexibly customizing the input for the extra CVAE path. Extensive experimental results and ablation studies demonstrate that our approach is capable of predicting state-of-the-art controllable human motions both qualitatively and quantitatively.",

keywords = "Deep generative model, Disentanglement learning, Stochastic motion prediction",

author = "Chunzhi Gu and Jun Yu and Chao Zhang",

note = "Publisher Copyright: {\textcopyright} 2023 Elsevier Ltd",

year = "2024",

month = feb,

doi = "10.1016/j.patcog.2023.109998",

language = "英語",

volume = "146",

journal = "Pattern Recognition",

issn = "0031-3203",

publisher = "Elsevier Ltd.",

}

TY - JOUR

T1 - Learning disentangled representations for controllable human motion prediction

AU - Gu, Chunzhi

AU - Yu, Jun

AU - Zhang, Chao

PY - 2024/2

Y1 - 2024/2

N2 - Generative model-based motion prediction techniques have recently realized predicting controlled human motions, such as predicting multiple upper human body motions with similar lower-body motions. However, to achieve this, the state-of-the-art methods require either subsequently learning mapping functions to seek similar motions or training the model repetitively to enable control over the desired portion of body. In this paper, we propose a novel framework to learn disentangled representations for controllable human motion prediction. Our task is to predict multiple future human motions based on the past observed sequence, with the control of partial-body movements. Our network involves a conditional variational auto-encoder (CVAE) architecture to model full-body human motion, and an extra CVAE path to learn only the corresponding partial-body (e.g., lower-body) motion. Specifically, the inductive bias imposed by the extra CVAE path encourages two latent variables in two paths to respectively govern separate representations for each partial-body motion. With a single training, our model is able to provide two types of controls for the generated human motions: (i) strictly controlling one portion of human body and (ii) adaptively controlling the other portion, by sampling from a pair of latent spaces. Additionally, we extend and adapt a sampling strategy to our trained model to diversify the controllable predictions. Our framework also potentially allows new forms of control by flexibly customizing the input for the extra CVAE path. Extensive experimental results and ablation studies demonstrate that our approach is capable of predicting state-of-the-art controllable human motions both qualitatively and quantitatively.

AB - Generative model-based motion prediction techniques have recently realized predicting controlled human motions, such as predicting multiple upper human body motions with similar lower-body motions. However, to achieve this, the state-of-the-art methods require either subsequently learning mapping functions to seek similar motions or training the model repetitively to enable control over the desired portion of body. In this paper, we propose a novel framework to learn disentangled representations for controllable human motion prediction. Our task is to predict multiple future human motions based on the past observed sequence, with the control of partial-body movements. Our network involves a conditional variational auto-encoder (CVAE) architecture to model full-body human motion, and an extra CVAE path to learn only the corresponding partial-body (e.g., lower-body) motion. Specifically, the inductive bias imposed by the extra CVAE path encourages two latent variables in two paths to respectively govern separate representations for each partial-body motion. With a single training, our model is able to provide two types of controls for the generated human motions: (i) strictly controlling one portion of human body and (ii) adaptively controlling the other portion, by sampling from a pair of latent spaces. Additionally, we extend and adapt a sampling strategy to our trained model to diversify the controllable predictions. Our framework also potentially allows new forms of control by flexibly customizing the input for the extra CVAE path. Extensive experimental results and ablation studies demonstrate that our approach is capable of predicting state-of-the-art controllable human motions both qualitatively and quantitatively.

KW - Deep generative model

KW - Disentanglement learning

KW - Stochastic motion prediction

UR - http://www.scopus.com/inward/record.url?scp=85173438049&partnerID=8YFLogxK

U2 - 10.1016/j.patcog.2023.109998

DO - 10.1016/j.patcog.2023.109998

M3 - 学術論文

AN - SCOPUS:85173438049

SN - 0031-3203

VL - 146

JO - Pattern Recognition

JF - Pattern Recognition

M1 - 109998

ER -

Learning disentangled representations for controllable human motion prediction

抄録

ASJC Scopus 主題領域

文献へのアクセス

フィンガープリント

引用スタイル