Cart pole system stabilization with Unity and ML Agents - Pt 1

July 7, 2021

Series of tutorials

This is a series of posts describing how to create an inverted pendulum (or cart pole system) with Unity and ML Agents.

The entire series will include:

Simulating its physics;
Stabilization of the pendulum;
Swing up;
To be defined;

Assumptions

Assumptions to follow this series

You have Unity installed
You have python installed
You know the very basics of each software

What is the idea?

If you have no clue to what this is, a cart pole system, you can start by reading this note. And the idea here is to create an algorithm to control the cart pole system. Usually, in control systems, we use algorithms like PID, or a non-linear approach such as LQR.

We can break down the cart pole system in two parts

Stabilization

This is the stage where we can approach this problem with a linear solution, like PID, as the variation of the angle is very small.

If you consider the pendulum at the top as 0°, then the control will try to maintain the pendulum at ~14° from 0°, where the region is linear, you can read more about Small Angle Approximation if you are interested.

So when the pendulum is in this region, we can implement solutions like PID to maintain the pendulum at 0°.

Swing Up

This is the stage where it's a bit harder to approach with linear solutions, as the pendulum is far from its linear region. This is where non-linear solutions such as LQR are implemented. You can also break down the problem in two parts, solve the swing up with a non-linear approach, then when the system reaches the linear region (~14°), you can implement something like PID to solve the control part of the problem.

What about Reinforcement Learning?

The idea here is to use Unity to model a cart pole system, and use mlagents both in unity and python, to make the pendulum learn its own control. In this way, we can avoid the pitfalls of going the route of common linear or non-linear control approaches. Like the need to have a precise mathematical model of the system.

Jhonatan da Silva