Concealed Face Analysis and Facial Reconstruction via a Multi-Task Approach and Cross-Modal Distillation in Terahertz Imaging

Bergman, Noam; Yildirim, Ihsan; ŞAHİN, ASAF; ALTAN, HAKAN; Yitzhaky, Yitzhak

doi:10.3390/s26041341

Concealed Face Analysis and Facial Reconstruction via a Multi-Task Approach and Cross-Modal Distillation in Terahertz Imaging

Bergman N., Yildirim I. O., ŞAHİN A. B., ALTAN H., Yitzhaky Y.

Sensors, cilt.26, sa.4, 2026 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 26 Sayı: 4
Basım Tarihi: 2026
Doi Numarası: 10.3390/s26041341
Dergi Adı: Sensors
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, INSPEC, MEDLINE, Directory of Open Access Journals
Anahtar Kelimeler: cross-modal fusion, deep learning, facial biometrics, knowledge distillation, multi-task learning, terahertz imaging, THz facial reconstruction
Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

Terahertz (THz) sub-millimeter wave imaging offers unique capabilities for stand-off biometrics through concealment, yet it suffers from severe sparsity, low resolution, and high noise. To address these limitations, we introduce a novel unified Multi-Task Learning (MTL) network centered on a custom shared U-Net-like THz data encoder. This network is designed to simultaneously solve three distinct critical tasks on concealed THz facial data, given a limited dataset of approximately 1400 THz facial images of 20 different identities. The tasks include concealed face verification, facial posture classification, and a generative reconstruction of unconcealed faces from concealed ones. While providing highly successful MTL results as a standalone solution on the very challenging dataset, we further studied the expansion of this architecture via a cross-modal teacher-student approach. During training, a privileged visible-spectrum teacher fuses limited visible features with THz data to guide the THz-only student. This distillation process yields a student network that relies solely on THz inputs at inference. The cross-modal trained student achieves better latent space in terms of inter-class separability compared to the single-modality baseline, but with reduced intra-class compactness, while maintaining a similar success in the task performances. Both THz-only and distilled models preserve high unconcealed face generative fidelity.