A Unified Monocular Vision-Based Driving Model for Autonomous Vehicles With Multi-Task Capabilities


AZAK S., Bozkaya F., Tığlıoğlu Ş., Yusefi A., Durdu A.

IEEE Transactions on Intelligent Vehicles, vol.10, no.9, pp.4397-4408, 2025 (SCI-Expanded, Scopus) identifier

  • Publication Type: Article / Article
  • Volume: 10 Issue: 9
  • Publication Date: 2025
  • Doi Number: 10.1109/tiv.2024.3483114
  • Journal Name: IEEE Transactions on Intelligent Vehicles
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus
  • Page Numbers: pp.4397-4408
  • Keywords: Autonomous driving, end-to-end learning, multi-task learning, self-driving car, steering estimation
  • Middle East Technical University Affiliated: Yes

Abstract

The recent progress in autonomous driving primarily relies on sensor-rich systems, encompassing radars, LiDARs, and advanced cameras, in order to perceive the environment. However, human-operated vehicles showcase an impressive ability to drive based solely on visual perception. This study introduces an end-to-end method for predicting the steering angle and vehicle speed exclusively from a monocular camera image. Alongside the color image, which conveys scene texture and appearance details, a monocular depth image and a semantic segmentation image are internally derived and incorporated, offering insights into spatial and semantic environmental structures. This results in a total of three input images. Moreover, LSTM units are also employed to acquire temporal features. The proposed model demonstrates a significant enhancement in RMSE compared to the state-of-the-art, achieving a notable improvement of 44.96% for the steering angle and 4.39% for the speed on the Udacity dataset. Furthermore, tests on the CARLA and Sully Chen datasets yield results that outperform those reported in the literature. Extensive ablation studies are also conducted to showcase the effectiveness of each component. These findings highlight the potential of self-driving systems using visual input alone.