Reinforcement Learning versus Conventional Control for Controlling a Planar Bi-rotor Platform with Tail Appendage


Ugurlu H. I. , KALKAN S. , SARANLI A.

Journal of Intelligent and Robotic Systems: Theory and Applications, vol.102, no.4, 2021 (Journal Indexed in SCI) identifier identifier

  • Publication Type: Article / Article
  • Volume: 102 Issue: 4
  • Publication Date: 2021
  • Doi Number: 10.1007/s10846-021-01412-3
  • Title of Journal : Journal of Intelligent and Robotic Systems: Theory and Applications
  • Keywords: Flight control, Tail appendage, Conventional control, Deep reinforcement learning

Abstract

© 2021, The Author(s), under exclusive licence to Springer Nature B.V.In this paper, we study the conventional and learning-based control approaches for multi-rotor platforms, with and without the presence of an actuated “tail” appendage. A comprehensive experimental comparison between the proven control-theoretic approaches and more recent learning-based ones is one of the contributions. Furthermore, an actuated tail appendage is considered as a deviation from the typical multi-rotor morphology, complicating the control problem but promising some useful applications. Our study also explores, as another contribution, the impact of such an actuated tail on the overall position control for both the conventional as well as learning-based controllers. For the conventional control part, we used a multi-loop architecture where the inner loop regulates the attitude while the outer loop controls the position of the platform. For the learning controller, a multi-layer neural network architecture is used to learn a nonlinear state-feedback controller. To improve the learning and generalization performance of this controller, we adopted a curricular learning approach which gradually increases the difficulty of training samples. For the experiments, a planar bi-rotor platform is modeled in a 2D simulation environment. The planar model avoids mathematical complications while preserving the main attributes of the problem making the results more useful. We observe that both types of controllers achieve reasonable control performance and can solve the position control task. However, neither one shows a clear advantage over the other. The learning-based controller is not intuitive and the system suffers from long training times. The architecture of the multi-loop controller is handcrafted (not required for the learning-based controller) but provides a guaranteed stable behavior.