# Results

# Modelling

- In progress

# Unity Model Analysis

With the modelling of the system done inside Unity. An analysis of the model was needed, such as changing the parameters, such as mass, drag, angular drag, both in the cart and in the pendulum, and check if it behave as we expect from the mathematical modelling.

The base parameters were

Mass [kg] | Drag | Angular Drag | |

Cart | 1 | 0 | 0.05 |

Pendulum | 1 | 0.5 | 0.05 |

And the same force was applied to all the tests. With the characteristics of an impulse response.

## Cart parameters analysis

### Mass of the cart

The first simulation was regarding the mass of the cart. A simulation with the mass of the cart as 1kg, 5kg, 10kg and 50kg was made and this was the result both for the position of the cart and the angle of the pendulum in those simulations.

As expected, as we increase the mass of the car, applying the same force will result in different positions, as the accelaration of the cart will be less than the previous one. Regarding the pendulum, the same logic applies, when the mass of the cart increases, the displacement of the cart decreases and the angle of the pendulum decreases as well.

### Drag of the cart

Sidenote: In Unity, as we are using rigid bodies, we can interact with the drag and the angular drag.

## Pendulum parameter analysis

### Length of the pendulum

### Angular drag of the pendulum

# Model training - Stabilization

```
mlagents-learn ./trainer_0.16.1.yaml --run-id stable_004 --resume
tensorboard --logdir summaries
```

The final parameters of mass, drag and angular drag for the training

Mass [kg] | Drag | Angular Drag | |

Cart | 5 | 2 | 0 |

Pendulum | 1 | 0 | 0.5 |

The file configuration for the training

```
CartPole:
trainer_type: ppo
hyperparameters:
batch_size: 64
buffer_size: 12000
learning_rate: 0.0003
beta: 0.001
epsilon: 0.2
lambd: 0.95
num_epoch: 3
learning_rate_schedule: linear
network_settings:
normalize: true
hidden_units: 128
num_layers: 2
vis_encode_type: simple
reward_signals:
extrinsic:
gamma: 0.99
strength: 1.0
keep_checkpoints: 5
max_steps: 5.0e6
time_horizon: 1000
summary_freq: 1000
threaded: true
```

## Cumulative reward

Fundamentação

## Training response

number of points in the graph? frequency

## Parameter variation

The model was trained according to the table x. In order to test if the model is robust, changes to the parameters of the cart and pendulum were made to test if it could still stabilize the model without retraining.