Description

this is self learning robot using reinforcement learning (Q-learning, Watkins 1989);

it learns to choose actions from state rewards - no other support controller required (PID, Adaptive system ...)

to store Q values for small state space table can be used;
for large state space some approximation is necessary - I am using associative neural network;

state for line following test are three last line positions S(n) = (L(n), L(n-1), L(n-2)), where each L can have 128 possible line positions : 128^3 memory places for storing in table - too much ^_^ with using pure table, using associative neural network - no problem

video : https://www.youtube.com/watch?v=NrLjs8JVGNU

Details

video of experiment

robot

stm32f303 : 72MHz cortex M4 with FPU (avr can be used too, for slow robots ^_^)

line sensor APDS9950 : digital i2c RGB output, each have the some address, so

Iam using software i2c (with common scl)

line follower problem

common linefollower consist of line position sensor, PD controller for turning, some speed controller (PID can be used) and motors :

in this solution, reinforcement learning is used :

agent is looking for robot as on blackbox (figure)

agent is obtaining only current state and reward, and choosing actions - but don't

know what is action doing;

state is last three line positions

reward is how good is robot following line - how far it is from middle (middle is "0")

reinforcement learning

consist of four basic steps :

1) obtain state

2) execute action

3) obtain reward

4) learn from reward

system is handling "table" of values Q(s, a) : how good was action "a" in state "s"

Q(s, a) can be computed using Q-learning

in human words (look for picture)

to compute Q value consider agent is in state S1 and doing action A4, after

this it is now in state S2; following these steps Q value can be calculated

1) look for obtained reward : R(s1, a4) = 0.3

2) look for state where this action leaded you, (into S2)

3) choose the highest Q value in current state : Q(s2, a2) = 0.6, and multiply it with gamma = 0.9

4) sum reward and this value for Q(s1, a4)

note : there is no need to know into which state chosen action leads - agent learn from what really is

function approximation

for small count of states, a pure table for storing Q(s, a) can be used;

I'am using associative neural network (description will be in few days)

Components

1 × STM32f303 Microprocessors, Microcontrollers, DSPs / ARM, RISC-Based Microcontrollers

4 × APDS9950 I2C RGB sensor

2 × pololu micro motor, 1:100

2 × DRV8830 Power Management ICs / Motion, Motor and Servo Control

Discussions

Self learning robot

Description

Details

video of experiment

robot

line follower problem

reinforcement learning

function approximation

Components

Discussions

Similar Projects

QuadRotor testBed for localization and mapping

Mind Controlled Robot

Line Follower Robot

FPGA Boot Camp #4: State Machines

Self learning robot

Become a Hackaday.io member

Just one more thing

Description

Details

video of experiment

robot

line follower problem

reinforcement learning

function approximation

Components

Enjoy this project?

Discussions

Become a Hackaday.io Member

Similar Projects

QuadRotor testBed for localization and mapping

Mind Controlled Robot

Line Follower Robot

FPGA Boot Camp #4: State Machines

Does this project spark your interest?

Report project as inappropriate

Send message

Remove Member