A Reinforcement Learning Approach to the 2v2 Beyond Visual Range Air Combat Maneuvering Problem