OpenAI Environment Reward Tuning

A project log for Open Source Hardware with Machine Learning

Open source hardware environment for machine learning so that these algorithms can eventually be applied to more complex real life systems.

AlexAlex 7 days ago0 Comments

Quite a while ago I started creating a new OpenAI environment for the inverted pendulum.  While at the time it seemed easy as all the physics from the CartPole environment would the same, but as it turns out they are not.  By using a stepper motor on the hardware we changed the physics of the system; the stepper motor automatically damps the system because it will fight external forces, such as the pole, to maintain its current position.  This meant all the physics of the system had to be redone (fortunately it actually made the physics easier).  The source code for the open AI environment can be found on my Github.

Shortly after the physics were redone I determined that the reward system use in CartPole would no longer work for an inverted pendulum.  To get one point the algorithm would have to randomly stumble between -15 deg and +15 deg which takes WAY too long.  To solve this problem I devised this cool math equation (plotted below) to help direct the learning algorithm in the correct direction.

Now I am currently building a reinforcement model that can reliably control the system.  I started with a DQN, but it was quite slow and couldn't figure out how to flip it up.  Then I tried a DDQN (also had the same problems).  Now I am  currently trying out an Asynchronous Advantage Actor-Critic (a3c) model to see if it will perform any better.  This repo has been extremely helpful in my reinforcement learning algorithm development.