Speech discrimination in real-world group communication using audio-motion multimodal sensing

Takayuki Nozawa*, Mizuki Uchiyama, Keigo Honda, Tamio Nakano, Yoshihiro Miyake

*この論文の責任著者

研究成果: ジャーナルへの寄稿学術論文査読

2 被引用数 (Scopus)

抄録

Speech discrimination that determines whether a participant is speaking at a given moment is essential in investigating human verbal communication. Specifically, in dynamic real-world situations where multiple people participate in, and form, groups in the same space, simultaneous speakers render speech discrimination that is solely based on audio sensing difficult. In this study, we focused on physical activity during speech, and hypothesized that combining audio and physical motion data acquired by wearable sensors can improve speech discrimination. Thus, utterance and physical activity data of students in a university participatory class were recorded, using smartphones worn around their neck. First, we tested the temporal relationship between manually identified utterances and physical motions and confirmed that physical activities in wide-frequency ranges co-occurred with utterances. Second, we trained and tested classifiers for each participant and found a higher performance with the audio-motion classifier (average accuracy 92.2%) than both the audio-only (80.4%) and motion-only (87.8%) classifiers. Finally, we tested inter-individual classification and obtained a higher performance with the audio-motion combined classifier (83.2%) than the audio-only (67.7%) and motion-only (71.9%) classifiers. These results show that audio-motion multimodal sensing using widely available smartphones can provide effective utterance discrimination in dynamic group communications.

本文言語英語
論文番号2948
ジャーナルSensors
20
10
DOI
出版ステータス出版済み - 2020/05/02

ASJC Scopus 主題領域

  • 分析化学
  • 情報システム
  • 原子分子物理学および光学
  • 生化学
  • 器械工学
  • 電子工学および電気工学

フィンガープリント

「Speech discrimination in real-world group communication using audio-motion multimodal sensing」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル