I spent the last couple months working on an AI system where actions would be able to be executed to achieve a goal, by chaining a couple of different feed forward networks together. I have proven that this hypothesis and architecture is not going to result in learned behavior. I believe that it is not learning behaviors because I did not capture the relationship between a “good” goal state or a “bad” goal state.
In my new proposal I am still sticking with a feed forward network, but instead of ending at a goal, this ends at an action. The goal is not actually a part of the network, although it is still a critical part of the system. The neural network architecture for this system is illustrated below.
The innovative aspect to this feedforward network does not have to do with the network itself, but with the way that the goal(s) will be used to adjust the learning rate of the back propagation routine.
As a reminder, in Synapse, all neural objects (Inputs, Actions, Goals) are sensors. They may or may not need to be passed into the inputs of the network based on whether there would be a natural relationship between them, but they all need to be sensing in the regular loop of the system.
With the proposed dynamic learning rate, when the synapses of the system are tuned in back propagation, depending on whether the system is close to an ideal goal or not, will depend on how much the weights of the system will be adjusted – and whether they will be tuned up to increase activation, or tuned down to inhibit activation.
Now that there is a dynamic learning rate that ties together the relationship between the current goal state, and the ideal goal state, the system needs to be tuned on an “ideal action”. An ideal action is dynamic based on the current action. A computed action (the end result of the forward propagation) is never a clean vector with 0, or 1 as values. There are some options I am playing with, but the first attempt at creating an “ideal action” is to normalize the action and then backpropagate with that new normalized value.
I believe that the “ideal action” will vary based on how your motor system interprets the action sense, but the “ideal action” should accentuate the highs and lows so that the system can adjust itself to those outlying values.
The new proposal incorporates two innovative concepts to try to reach a system that learns actions without an explicit teacher, and only using sensors. Those concepts are:
- Dynamic learning rate based on ideal goal states versus actual goal states.
- Idealized actions to help the system reinforce or inhibit certain action features.
With the relationship between an ideal goal state and an actual goal state being a central part to how the system tunes its synapses the issues with the previous attempt should be fixed. I see some implementation details that need to be ironed out when creating “ideal actions”. If something goes wrong with this version, this is the area that is going to be most scrutinized.
Here is hoping third time is the charm. I’ll keep you updated!