We started our machine learning algorithm on some Custom OpenAI Environments to minimize algorithm development time and also to shorten the time spent training on hardware. The training is broken into several phases. Each phase aims to improve the AI in some way, just as a person learning a sport would do practice drills. The first phase focuses on flipping the pole up, and balancing the pole once it is up, separately. The second phase tries to combine the two actions into a single event. The third phase is the hardest, it flips the pole up, balances it and then deals with random force injections into the system, simulating wind, or a person instantaneously adding force to the cart/pole.