A Unified Monocular Vision-Based Driving Model for Autonomous Vehicles With Multi-Task Capabilities

AZAK, SALİM; Bozkaya, Frat; Tığlıoğlu, Şükrücan; Yusefi, Abdullah; Durdu, Akif

doi:10.1109/tiv.2024.3483114

A Unified Monocular Vision-Based Driving Model for Autonomous Vehicles With Multi-Task Capabilities

AZAK S., Bozkaya F., Tığlıoğlu Ş., Yusefi A., Durdu A.

IEEE Transactions on Intelligent Vehicles, cilt.10, sa.9, ss.4397-4408, 2025 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 10 Sayı: 9
Basım Tarihi: 2025
Doi Numarası: 10.1109/tiv.2024.3483114
Dergi Adı: IEEE Transactions on Intelligent Vehicles
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
Sayfa Sayıları: ss.4397-4408
Anahtar Kelimeler: Autonomous driving, end-to-end learning, multi-task learning, self-driving car, steering estimation
Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

The recent progress in autonomous driving primarily relies on sensor-rich systems, encompassing radars, LiDARs, and advanced cameras, in order to perceive the environment. However, human-operated vehicles showcase an impressive ability to drive based solely on visual perception. This study introduces an end-to-end method for predicting the steering angle and vehicle speed exclusively from a monocular camera image. Alongside the color image, which conveys scene texture and appearance details, a monocular depth image and a semantic segmentation image are internally derived and incorporated, offering insights into spatial and semantic environmental structures. This results in a total of three input images. Moreover, LSTM units are also employed to acquire temporal features. The proposed model demonstrates a significant enhancement in RMSE compared to the state-of-the-art, achieving a notable improvement of 44.96% for the steering angle and 4.39% for the speed on the Udacity dataset. Furthermore, tests on the CARLA and Sully Chen datasets yield results that outperform those reported in the literature. Extensive ablation studies are also conducted to showcase the effectiveness of each component. These findings highlight the potential of self-driving systems using visual input alone.