Quite a while ago I started creating a new OpenAI environment for the inverted pendulum. While at the time it seemed easy as all the physics from the CartPole environment would the same, but as it turns out they are not. By using a stepper motor on the hardware we changed the physics of the system; the stepper motor automatically damps the system because it will fight external forces, such as the pole, to maintain its current position. This meant all the physics of the system had to be redone (fortunately it actually made the physics easier). The source code for the open AI environment can be found on my Github.
Shortly after the physics were redone I determined that the reward system use in CartPole would no longer work for an inverted pendulum. To get one point the algorithm would have to randomly stumble between -15 deg and +15 deg which takes WAY too long. To solve this problem I devised this cool math equation (plotted below) to help direct the learning algorithm in the correct direction.