Fido

Description

We built an AI, nicknamed "Fido," that allows any robot to learn organically over its lifetime rather than having to be preprogrammed. Humans can train Fido by giving reward, similarly to how humans train dogs using treats.

Fido uses a modified version of the Q-learning reinforcement learning algorithm coupled with an neural network and a wire-fitted interpolator. It follows an intelligent action selection policy, efficiently adjusts its model architecture, and trains itself on samples of past experiences. These machine learning innovations allow Fido to learn 4x faster than the industry standard, making Fido practical.

To test Fido, we built a simulator and three robots of varying processing capability and kinematics. We trained Fido on various common robotics tasks, such as following a line and driving to a speaker. Fido successfully learned all given tasks in simulation and in hardware with low latency, even on low power devices like the $5 Raspberry Pi Zero.

Details

The Idea

The original idea behind Fido was to, well, build a robot dog. My partner and I had no real machine learning or artificial intelligence knowledge, so we did a lot of reading. After a bunch of experimentation, we tried to science-up our idea a little bit and generalize it into something actually practical. That something practical is Fido.

Background

Today, most robots are procedurally programmed. That means that for every task any robot has to perform, an "expert" has to come in and write the code for it in a traditional programming language like C++, Java, Python, Fortran, etc. What Fido does, among other things, is eliminate the expert part of the equation and allow anybody to use and train any robot to perform any task. While this may seem like a grand statement, Fido was able to successfully run on a differential drive robot, a holonomic kiwi drive robot, and a robot arm with zero code changes. And it was actually practical too: we could train Fido to follow a line faster than we could program it. We chose reinforcement as a training method as it seemed organic: reinforcement is the natural method of learning and growth, and a simple concept for anybody to understand and use.

Scientific Impact

Beyond the practical applications, Fido also represents a breakthrough in the field of reinforcement learning for robotics. Traditional systems for reinforcement learning, such as Q-Learning, take thousands of learning iterations, or pieces of reward given, to learn basic tasks. This means that robots using this system can't really be trained by humans; instead, they must be trained by other robots with infinite attention spans and perfect reward-giving skills. We came up with a few fancy upgrades to Q-Learning that allow it to learn 3 times faster and 4 times better, even in noisy real world environments. This represents a fundemental shift in reinforcement learning for robotics: Fido actually makes the whole "humans training robots" deal practical. And beyond robotics and machine learning, this is a step towards a general intelligence. If we learn through reinforcement using constructs such as pain and enjoyment, who's to say that Fido doesn't feel the same way?

How We Built It

First, all of our work is open source under the MIT license and released on Github. We used git throughout our development process, so you can track our progress over the last few months. The codebase ended up being quite large, as to make Fido practical on embedded systems we had to write all of the machine learning constructs ourselves in C++. While this certainly took a lot of time and understanding, it ended up being super great for a few reasons.

First, we were able to make optimizations throughout the entire algorithm that gave us practically no noticeable latency, even while running on lower power devices like the Pi Zero. Second, we ended up having a lot of "left over code," stuff like genetic algorithms that we experimented with, but didn't actually end up in the AI. We decided to re-organize our repo as a general purpose open source C++ machine learning library for embedded electronics and robotics, and have been enjoying seeing Fido's popularity rise. We're proud to say that Fido now has over 300 stars on Github, and we've appreciated contributions from the open source community all across the country.

Implementation Details

The Fido project overall roughly broke down into two categories: the learning algorithm, and the hardware implementations. We'll be getting into the nitty gritty on both of these subjects in our build logs, but we'll summarize here.

We programmed the AI in C++ with no external dependencies, writing the entire project from scratch (over 5000 lines of C++). Next, we built three robots and a simulator to test the algorithm. The first robot, nicknamed "Thing One," has a 3D printed chassis (designed in AutoCAD student), uses a differential drive system, and is powered an Intel Edison for computation. After constructing Thing One,...

Files

Fido Paper.pdf

Our research paper.

Adobe Portable Document Format - 252.04 kB - 06/07/2016 at 02:00

Preview

Download

Components

1 × Wirefitted Feedforward Neural Network As the heart of Fido's learning algorithm, a feedforward neural network coupled with a least squares interpolator is used to model the reward function. This allows relations to be made both between similar actions and similar states, reducing required training iterations.

1 × Q-Learning Reinforcement Learning Algorithm Fido uses a modified version of the model-free q-learning to shape Fido's q-function, allowing for the system's universality.

1 × Uncertainty Approximation System Fido continuously calculates an uncertainty value, or the deviation between it's new and it's old model. This novel system allows Fido to detect when it's being retrained, allowing efficient hyperparameter optimization.

1 × Intelligent Probabilistic Action Selection Policy Fido uses it's uncertainty value to modify it's exploration in action selection, making Fido explore more when being retrained. This not only allows Fido to learn faster and better, but makes it highly retrainable.

1 × Dynamically Optimized Experience Replay In addition to training it's model on new experiences, Fido will also train on past experiences to decrease training time. The number of past experiences sampled is kept proportional to the derivative of the uncertainty value, as Fido shouldn't train from an old task's experiences.

Discussions

jaromir.sukuba wrote 06/08/2016 at 09:39

"An AI that lets any robot learn through positive and negative feedback."

Once this project goes to Hackaday blog, you'll receive a lot of negative feedback, I guess. That's how the community works.

Nice project, by the way.

Are you sure? yes | no

Joshua Gruenstein wrote 06/08/2016 at 20:25

Oh dear. What specifically do you think we should change? We tried to make the title general and understandable. In our paper we call it "A Universal Robot Control System using Reinforcement Learning with Limited Feedback," but that seemed a tad wordy for this context. Thanks for the heads up.

Are you sure? yes | no

jaromir.sukuba wrote 06/08/2016 at 21:03

No, don't get me wrong, I was just joking about some aspects of blog community.

I don't think you need to change something about it, the project is great as is. Do you have some video of robot (Thing 1 or Thing 2) in action? People love robot videos.

Are you sure? yes | no

Joshua Gruenstein wrote 06/08/2016 at 23:37

Ha, sorry, guess I needed some coffee. Yeah, we have some videos, but we've been a little busy and haven't gotten around to uploading them. We also plan on adding a lot of detail about the software and hardware implementations in our build log. Thanks for the positive feedback.

Are you sure? yes | no

Description

Details

The Idea

Background

Scientific Impact

How We Built It

Implementation Details

Files

Fido Paper.pdf

Components

Discussions

Similar Projects

ChaudhryBot

OpenCat - Open Source Quadruped Robotic Framework

Modular Open Source Humanoid AI Robot

BunnyBot

Fido

Become a Hackaday.io member

Just one more thing

Description

Details

The Idea

Background

Scientific Impact

How We Built It

Implementation Details

Files

Fido Paper.pdf

Components

Enjoy this project?

Discussions

Become a Hackaday.io Member

Similar Projects

ChaudhryBot

OpenCat - Open Source Quadruped Robotic Framework

Modular Open Source Humanoid AI Robot

BunnyBot

Does this project spark your interest?

Report project as inappropriate

Send message

Remove Member