Incorporating interpersonal synchronization features for automatic emotion recognition from visual and audio data during communication

Jingyu Quan, Yoshihiro Miyake, Takayuki Nozawa*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

13 Scopus citations

Abstract

During social interaction, humans recognize others’ emotions via individual features and interpersonal features. However, most previous automatic emotion recognition techniques only used individual features—they have not tested the importance of interpersonal features. In the present study, we asked whether interpersonal features, especially time-lagged synchronization features, are beneficial to the performance of automatic emotion recognition techniques. We explored this question in the main experiment (speaker-dependent emotion recognition) and supplementary experiment (speaker-independent emotion recognition) by building an individual framework and interpersonal framework in visual, audio, and cross-modality, respectively. Our main experiment results showed that the interpersonal framework outperformed the individual framework in every modality. Our supplementary experiment showed—even for unknown communication pairs—that the interpersonal framework led to a better performance. Therefore, we concluded that interpersonal features are useful to boost the performance of automatic emotion recognition tasks. We hope to raise attention to interpersonal features in this study.

Original languageEnglish
Article number5317
JournalSensors
Volume21
Issue number16
DOIs
StatePublished - 2021/08

Keywords

  • Affective computing
  • Classification
  • Communication
  • Deep neural networks
  • Emotion recognition
  • Interpersonal features
  • Multimodal

ASJC Scopus subject areas

  • Analytical Chemistry
  • Information Systems
  • Atomic and Molecular Physics, and Optics
  • Biochemistry
  • Instrumentation
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Incorporating interpersonal synchronization features for automatic emotion recognition from visual and audio data during communication'. Together they form a unique fingerprint.

Cite this