Spatio-Temporal Analysis of Facial Actions using Lifecycle-Aware Capsule Networks

Churamani N., KALKAN S., Güneş H.

IEEE International Conference on Automatic Face and Gesture Recognition, India, 15 - 18 December 2021 identifier identifier

  • Publication Type: Conference Paper / Full Text
  • Doi Number: 10.1109/fg52635.2021.9666978
  • Country: India
  • Middle East Technical University Affiliated: Yes


Most state-of-the-art approaches for Facial Action Unit (AU) detection rely on evaluating static frames, encoding a snapshot of heightened facial activity. In real-world interactions, however, facial expressions are more subtle and evolve over time requiring AU detection models to learn spatial as well as temporal information. In this work, we focus on both spatial and spatio-temporal features encoding the temporal evolution of facial AU activation. We propose the Action Unit Lifecycle-Aware Capsule Network (AULA-Caps) for AU detection using both frame and sequence-level features. While, at the frame-level, the capsule layers of AULA-Caps learn spatial feature primitives to determine AU activations, at the sequence-level, it learns temporal dependencies between contiguous frames by focusing on relevant spatio-temporal segments in the sequence. The learnt feature capsules are routed together such that the model learns to selectively focus on spatial or spatio-temporal information depending upon the AU lifecycle. The proposed model is evaluated on popular benchmarks, namely BP4D and GFT datasets, obtaining state-of-the-art results for both.