Abstract
In most high-risk applications, interpretability is crucial for ensuring system safety and trust. However, existing research often relies on hard-to-understand, highly parameterized models, as, e.g., neural networks. In this paper, we focus on the problem of policy search in continuous observations and actions spaces, and we leverage two graph-based Genetic Programming (GP) techniques—Cartesian Genetic Programming (CGP) and Linear Genetic Programming (LGP)—to develop efficient yet interpretable control policies. Our experimental evaluation on eight Mujoco suite benchmarks shows competitive results compared to state-of-the-art Reinforcement Learning (RL) algorithms. Moreover, our policies, being represented by small-sized graphs, are simple and transparent, paving the way for trustworthy AI in the domain of continuous control.