44th European Rotorcraft Forum 2018, ERF 2018, Delft, Netherlands, 18 - 21 September 2018, vol.2, pp.917-924
Copyright © 2018 by author(s).This study presents an application of an actor-critic reinforcement learning method to the nonlinear problem of helicopter guidance during autorotation in order to achieve safe landing following engine power loss. A point mass model of an OH-58A helicopter in autorotation was built to simulate autorotation dynamics. The point-mass model includes equations of motion In vertical plane. The states of the point-mass model are the horizontal and vertical velocities, the horizontal and vertical positions, the rotor angular speed and the horizontal and vertical components of the rotor thrust coefficient. The inputs to the model were chosen to be the rates of change of the horizontal and vertical components of the rotor thrust coefficient. A reinforcement learning agent was trained by a model-free asynchronous actor-critic algorithm, where training episodes were parallelized on a multi-core CPU. Objective of the training was defined as achieving near-zero horizontal and vertical kinetic energies at the instant of touchdown. Training episodes were defined as the autorotative flight from an initial equilibrium flight condition to touchdown. During each training episode, the agent was presented a reward at each discrete time-step according to a multiconditional reward function. Constraints on the rotor angular speed, the rotor disk orientation and the rotor thrust coefficient were implemented by structuring the reward function accordingly. Reward function was programmed to output a weighted sum of squared vertical and horizontal velocities at touchdown. Majority of the reinforcement signal came from this reward at touchdown, as it is a measure of success for the agent in accomplishing the safe autorotation landing task. The agent consists of two separate neural network function approximators, namely the actor and the critic. The critic approximates the value of a set of states. The actor generates a set of actions given a set of states, sampled from a Gaussian distribution with mean values as output set of the actor network. Updates to the parameters of both networks were calculated by an n-step returns scheme, which accumulates gradients coming from individual time steps into large, once per episode updates to improve training stability. RMSProp algorithm was used for optimization. After training is complete, the agent was tested against different initial conditions both inside and outside of the height-velocity (H-V) avoidance region of the standard OH-58A helicopter at maximum gross weight. Results achieved by the agent indicates that the method is well-suited for guiding the helicopter safely to the ground in a closed loop manner for a large initial condition state space. Controls generated by the reinforcement learning agent were found to be similar to a helicopter pilot's technique during autorotative flight. The study demonstrates that a significant part of a helicopterâǍŹs H-V restriction zone can be reduced using the presented reinforcement learning method for autonomous landing of a helicopter in autorotation.