After reviewing the existing systems and patterns for AI development (there are always small ones out there) I was faced with the task of looking at the three main variables that all the others are using, and somehow put them together to create something different.
- Sensors (Input)
High level I was thinking of something like this for the structure of Synapse:
That is not where I am anymore…
The struggle with a system like this is that the relationship between the goals and actions is not clear. I was struggling with identifying how to optimize the actions based on whether a goal was being achieved or not. This started me down a path where I was largely reinventing a reinforcement learning system. This type of a system has too much setup for what I am hoping to achieve, so I failed that structure as a potential means.
This forced me to take a look at those same three elements again and see if I could re-arrange them so that the relationship between them allows for training of actions based on goals.
Then came my AHA moment….
I kept thinking about how to optimize the actions, but in reality the system I had bouncing around in my head was only thinking about optimizing against the goal.
In this structure, the ends justify the means.
This started to make a lot of sense to me, and provided an explanation for irrational actions when current AI systems (reinforcement learning) are put together strictly rationally. Meaning each action is weighing the pros/cons of alternative actions before it decides on an action to execute.
In the system I’m proposing that optimizes against the goal, the action is nothing more than a side effect of that goal optimization.
Maybe this can explain why I do things that I can not explain to my fiance?
With that hypothesis in mind, I put together a neural architecture that looks like this.
I sat on this for a night before I came up with a twist that I think may be needed, or at a minimum could help.
In the process of falling asleep, I was contemplating how this goal oriented system would actually optimize for the goal while altering the actions, and I thought maybe it needed more information to alter the weights of the actions. If I just drop the input into the same layer (unchanged) as the middle action layer – illustrated below – then that information could be used to optimize the goal.
My reasoning for dropping the input into the action section is that this system is 2 networks in 1. The first network is the input -> action network, and the second is the action -> goal network. The difference is that all the weights will be updated according to the optimization against the goal.
What this means is that in the 2nd network (action -> goal) can use the input and the output of the first network in order to properly set the relationship between them and the goal. I might be over thinking this piece, and I will test both of the architectures and see if this change is needed but it should be helpful at a minimum.
I plan on executing this test with the standard feed forward and back propagation techniques of the most common neural networks. That should allow me to test out the hypothesis specifically and not the other variables I have in mind for future implementation enhancements.
If my results show that I am unable to optimize against the goal, and I have evidence that this structure does not work, I may need to review how the back propagation routine is calculating the contributions to the error for the weights, and potentially put more value on the weights of the actions. This structure (or similar) has the means to provide unlabeled learning and solve some of the biggest challenges of AI today, so I plan on digging into it this significantly in the next year.
Now comes the heavy lifting of actually implementing…
What are your thoughts on this? Is there any other examples of a system that is like this, or is this a novel approach?