Close

last codes

A project log for NSCE-ngbrain

scalable hardware for neural network system

3drobert3drobert 01/03/2022 at 04:150 Comments

Still adding enhancements like:

adadelta,

minibatch (6CPU_iterationsx5GPU experiences),

visualization, corrections, etc...

Also I'm using temporal_window variable which is used on input layer to add the lastInputs+lastActions (ConvnetJS :) ), to get now a better estimates in the reinforcement learning T value.

before:

_N = 0
maxval = forward(state_N+1)
t = reward_N*maxval
forward(state_N)
backward(action_N, t)

 state1_t0(+1) = 1 // state0action0 usefull

 state2_t0(-1) = -1 // state1action1 usefull

and now...:

_N = 0
foreach temporal_windows:
  maxval = forward(state_N+1)
  t += reward_N*maxval
  _N++
_N = 0
forward(state_N)
backward(action_N, t)

state1_t0(+1) + state2_t1(-1) = 0 // state0action0 not usefull (or take as -1?)

state2_t0(-1) + state3_t1(+1) = 0 // state1action1 not usefull

other example:

state1_t0(+1) + state2_t1(+1) = 2 // state0action0 very usefull

state2_t0(-1) + state3_t1(-1) = -2 // state1action1 very usefull

--------------------------------------------------------------------------------------

I want try some day what happens if I connect a blank neural network used as reward applicator (to avoid indicate any reward) and this reward net came from some type of long memory net.

Reward net actualization will be modelated according to something like a "neurons cell energy" variables with thresholds values to send backwards signals to this applicator and associating somehow the current input to the long memory.

Also the inputs layer from normal sensors go to indicate actions as usual but also go to long_memory > reward_applicator to get a closed loop system.

I will not be able to give a reward for walking towards the food but if the long memory + reward helps to reach the food by chance, the neurons receive their energy and it is recorded. Otherwise... natural selection.

or something similar :D

Discussions