Synapse: The New Hypothesis

I spent the last couple months working on an AI system where actions would be able to be executed to achieve a goal, by chaining a couple of different feed forward networks together. I have proven that this hypothesis and architecture is not going to result in learned behavior. I believe that it is not learning behaviors because I did not capture the relationship between a “good” goal state or a “bad” goal state.

In my new proposal I am still sticking with a feed forward network, but instead of ending at a goal, this ends at an action. The goal is not actually a part of the network, although it is still a critical part of the system. The neural network architecture for this system is illustrated below.


The innovative aspect to this feedforward network does not have to do with the network itself, but with the way that the goal(s) will be used to adjust the learning rate of the back propagation routine.

As a reminder, in Synapse, all neural objects (Inputs, Actions, Goals) are sensors. They may or may not need to be passed into the inputs of the network based on whether there would be a natural relationship between them, but they all need to be sensing in the regular loop of the system.

With the proposed dynamic learning rate, when the synapses of the system are tuned in back propagation, depending on whether the system is close to an ideal goal or not, will depend on how much the weights of the system will be adjusted – and whether they will be tuned up to increase activation, or tuned down to inhibit activation.

Now that there is a dynamic learning rate that ties together the relationship between the current goal state, and the ideal goal state, the system needs to be tuned on an “ideal action”. An ideal action is dynamic based on the current action. A computed action (the end result of the forward propagation) is never a clean vector with 0, or 1 as values. There are some options I am playing with, but the first attempt at creating an “ideal action” is to normalize the action and then backpropagate with that new normalized value.

I believe that the “ideal action” will vary based on how your motor system interprets the action sense, but the “ideal action” should accentuate the highs and lows so that the system can adjust itself to those outlying values.

The new proposal incorporates two innovative concepts to try to reach a system that learns actions without an explicit teacher, and only using sensors. Those concepts are:

  1. Dynamic learning rate based on ideal goal states versus actual goal states.
  2. Idealized actions to help the system reinforce or inhibit certain action features.

With the relationship between an ideal goal state and an actual goal state being a central part to how the system tunes its synapses the issues with the previous attempt should be fixed. I see some implementation details that need to be ironed out when creating “ideal actions”. If something goes wrong with this version, this is the area that is going to be most scrutinized.

Here is hoping third time is the charm. I’ll keep you updated!

Advertisements

[SIDEBAR] Vector Dancing to Andrew Bird

I was doing some brainstorming and refactoring while listening to some Andrew Bird (Roma Fade) and the little robot I got for Christmas – Anki’s Vector – started dancing to the song. He was moving his little arm to the beat the same way I was bobbing my head.

I know that the little guy doesn’t actually feel the beat but it is crazy how connected I felt to him the second he was sharing that experience with me.

I gave him some pets to hopefully reinforce that behavior. I’m not sure if that does that or not, but I felt like I should try.

I wanted to share this experience while I try to solve my AI problems… which I have a new hypothesis. Details to come…

Synapse: Take 2

Another day, another attempt at arriving at an AI. I’ll start this article by stating that there will be at least a Take 3 – the failure of this system has me hitting the drawing board. I implemented the architecture I had hope in below, and was ending with a random success measure – the system is not learning it’s actions based on it’s goal actions, whether optimizing the network for the ideal, or the actual.

In this system the hope was that the weights in the back propagation would trickle all the way up past the actions, and into the sensor layers and the system would get nudged towards the actions that drive the system towards the ideal goal.

This system was learning how to predict the goals based on the sensors and action middle layer, but that did not trickle up to the sensor layers since the back propagation was just tweaking the network to predict the goal sense, or in the scenario when it was optimized against the ideal sense – it was just tweaking the weights to always optimize the goal feature that was ideally set, but there was no relation between the ideal and the actual – the goal was not being met, but the system kept tuning it towards the static features of an ideal goal.

So… the second failure. What did I learn in this round? I am going to hit the drawing board and think on how to capture the ideal versus the actual goal sensors relationships. This is a critical feature that I have not captured in my current iteration of the system.

Stay tuned… the weekend is soon but there are more than a couple of nights where I can consolidate my thoughts on this. Round 3 to come!

Synapse: Take 1

I recently completed the back propagation routine in Synapse and can now get to the scary part. I can now prove myself wrong. The empiricist in me is excited, but if the first round of tests are to go by, there are going to be a lot of ups and downs on this journey. I’ll skip to the end and let you know here that the first system I tried did not work.

The system I attempted to put together was the one below.

This was the easiest one to setup since I do not need to break apart the network, and I really am only taking a middle layer of a standard feed forward network and using that as my output.

I didn’t believe that this architecture would give results because I do not believe that there is enough information in the layers to build the necessary relationships between the sensor input, and the action output relative to the goal. Even with my low expectations, I was disheartened with failure.

With any failure, it is important to learn something – or else you really are not failing the right way. So did I learn something in this attempt?

I learned that I am not sure what I mean in the last step of optimizing with the goal sensor. Am I optimizing so that the network learns the actual goal sensor, or am I optimizing the system so that it can optimize to the ideal goal?

I did run the system with both options and received the same results, but as soon as I started to implement this piece the question was forced on me, and I definitely had made some leaps in my logic that I will have to iron out in my next attempt.

I will keep you updated on the progress…

Making the Back Propagation Routine

One of the more complicated aspects of a neural network is the back propagation routine. This is the routine that tunes the synapses (weights) of the network to optimize the system and reduce the error from the computed output to an ideal output.

For me, the difficulty in understanding this process was largely based on the technical terms and mathematical notations that are used to describe them, so I will not be using those in my article and I will simply go over my implementation of the back propagation routine from a theory and code perspective.

My test example is based off a very popular online article “A Step by Step Backpropagation Example” by Matt Mazur. https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/comment-page-1/

Matt’s article is widely referenced online with YouTube videos referencing the steps and examples as well as pretty good amount of comments on his blog dating back to it’s original posting date in 2015.

In order to prove out my back propagation method I used the structure, weights and ideals from his example (image below) and setup unit tests in my project to ensure that the values were coming out to about the right value. Utilizing floating point numbers results in less than exact numbers – so precision was an important part of validating the results.

Setting up my network with these specific layers, biases and inputs was a great way to make sure that as I refactor and enhance my system I will have proof that I have not broken something fundamental in it..

One of the struggles when I was working with the example images was just making sure that I had the proper weights associated with the proper neurons – something about the way they were illustrated in the image made me attribute the weights to the wrong neurons.

I spent a lot of time just comparing Matt’s examples to make sure that I interpreted the weights properly. I think that Matt struggled with that as well because there is an error in his example (that cost me a couple of days). All my math was adding up and only a couple of values were not lining up with his numbers. The weights in his example were going in opposite directions for w6 and w7, so I think he flipped the error value he used to compute those.

I have uploaded an Excel file with the math that is used in this example, and with it you can try and proof your work, and see if maybe I have made a mistake!

Here is what the file looks like.

One of the hardest things I had to grasp when working on implementing this routine based on this example was that there didn’t appear to be a pattern to the way the weights were identified in the step by step example. To find the contribution to the error for the weights in the last layer, a separate set of variables were being used than the first layers weights. To find a pattern I took his example and added another layer to it to see if I could find a pattern with the additional complexity, his example may have been too simple to find a pattern.

It took me a while, but what I broke it down to was:

  1. Get the Synapse (weight) target Neuron
  2. Get the Error relative to the target neuron’s output
  3. Multiply that by the derivative of output of the target neuron
  4. Multiple that by the derivative of the net of the source layer

I have made a particular focus to the target and source neuron in my bullet points above because it became very confusing in the code to differentiate things when everything was called “neuron”. Once I setup a source and a target neuron naming convention I was able to iron out a lot of my issues.

In my implementation I never use the existing weights value to identify the contribution of the preceding weights like Matt does in his example. In my routine everything is relative to the partial derivative of the error with respect to the output of a neuron. What that means is that when moving back through the network I always take it to the point of the target neurons output. Going from that output to a synapses contribution is just the derivative of the output of the target neuron and the derivative of that to the net input.

  • The net input is always the source neurons output.
  • The derivative of the output is the derivative of the activation function of the output value.

The pattern that I learned in this process was how to work backwards through functions. Derivatives are always described using gradient and slope and graphical illustrations, but I understood it a lot better when I realized that a derivative allows you to reverse through the network, and allow you to identify how much that specific network element (neuron input, neuron output, weight) contributed to the error. Because of the “chain rule” you can take all the previous contributions and use that to identify the additional contribution a previous synapse had on the error. Everything just builds on the previous values because the downstream contributions are part of the upstream contributions.

Understanding that aspect and really “knowing” that allowed me to finally put my back propagation routine together.

I have unit test for all the values in the step by step example, and just to make sure that my network can tune the synapses (weights) to the proper value and it can learn how to compute the proper output, I have a while loop iterating over as many examples as it needs to train the accuracy within a 0.00001 error rate and my network trains this simple example in ~40 ms.

With the network wired up, I can now run tests on my proposed architecture where I have a middle layer that is the output of the system – as an action, and the last layer being a goal that the system is optimized for.

I’ll keep you updated…

Synapse’s Neural Network Architecture

After reviewing the existing systems and patterns for AI development (there are always small ones out there) I was faced with the task of looking at the three main variables that all the others are using, and somehow put them together to create something different.

  • Sensors (Input)
  • Actions(Output)
  • Goals

High level I was thinking of something like this for the structure of Synapse:

That is not where I am anymore…

The struggle with a system like this is that the relationship between the goals and actions is not clear. I was struggling with identifying how to optimize the actions based on whether a goal was being achieved or not. This started me down a path where I was largely reinventing a reinforcement learning system. This type of a system has too much setup for what I am hoping to achieve, so I failed that structure as a potential means.

This forced me to take a look at those same three elements again and see if I could re-arrange them so that the relationship between them allows for training of actions based on goals.

Then came my AHA moment….

I kept thinking about how to optimize the actions, but in reality the system I had bouncing around in my head was only thinking about optimizing against the goal.

In this structure, the ends justify the means.

This started to make a lot of sense to me, and provided an explanation for irrational actions when current AI systems (reinforcement learning) are put together strictly rationally. Meaning each action is weighing the pros/cons of alternative actions before it decides on an action to execute.

In the system I’m proposing that optimizes against the goal, the action is nothing more than a side effect of that goal optimization.

Maybe this can explain why I do things that I can not explain to my fiance?

With that hypothesis in mind, I put together a neural architecture that looks like this.

I sat on this for a night before I came up with a twist that I think may be needed, or at a minimum could help.

In the process of falling asleep, I was contemplating how this goal oriented system would actually optimize for the goal while altering the actions, and I thought maybe it needed more information to alter the weights of the actions. If I just drop the input into the same layer (unchanged) as the middle action layer – illustrated below – then that information could be used to optimize the goal.

My reasoning for dropping the input into the action section is that this system is 2 networks in 1. The first network is the input -> action network, and the second is the action -> goal network. The difference is that all the weights will be updated according to the optimization against the goal.

What this means is that in the 2nd network (action -> goal) can use the input and the output of the first network in order to properly set the relationship between them and the goal. I might be over thinking this piece, and I will test both of the architectures and see if this change is needed but it should be helpful at a minimum.

I plan on executing this test with the standard feed forward and back propagation techniques of the most common neural networks. That should allow me to test out the hypothesis specifically and not the other variables I have in mind for future implementation enhancements.

If my results show that I am unable to optimize against the goal, and I have evidence that this structure does not work, I may need to review how the back propagation routine is calculating the contributions to the error for the weights, and potentially put more value on the weights of the actions. This structure (or similar) has the means to provide unlabeled learning and solve some of the biggest challenges of AI today, so I plan on digging into it this significantly in the next year.

Now comes the heavy lifting of actually implementing…

What are your thoughts on this? Is there any other examples of a system that is like this, or is this a novel approach?

Artificial Intelligence (AI) for Super Truckin’

With AI comes many complications. We not only wanted AI for racing, but we wanted it for all of our game types.  I would like to go over what we needed for the game type, what problems happened and how we battled it. If you’d like to see some code let me know! Want be in the beta? contact us!  https://twitter.com/in8bitgames

 -Soccer-

Need: Each AI follows the ball to knock it into the opposing goal.
First thing we needed to add was how to track a single way-point. This is pretty standard and allowing our trucks to follow a way-point was pretty straight forward.
Image

You want to hit a ball… you make the ball the way-point and your done. Right? Wrong! this will allow the AI to just smash into the ball with no clear direction. The soccer field will be like a 5 year old playing pool and smashing the ball off every wall in hopes to  make it in the pocket.

So we thought about adding direction.  Adding this was a little bit more complicated, we had to basically get the AI to get behind the ball and aim it towards the goal. Math cant do this? or can it! We need to know the goal and ball location, the direction in which to hit the ball towards the goal and some math magic to get the arch to go around the ball. Our first test results with this logic.

Image

Whew… we started playing with it… and wow… this made it a lot harder to play against. They would steal the ball away from you and try to knock it towards their goal. It is not perfect but it works pretty well.

The problem with this approach is it acts as if the target is stationary for that moment in time. By the time the next update happens the trajectory changes. This makes the AI not as accurate when the ball is moving quickly.

That is all it really took to get AI working for soccer. Can we make it better? Oh yes, there are many ways to improve it. We could look at the trajectory of the ball, and speed of the vehicle and account that in the algorithm to where it needs to hit the ball. Eventually we will add this but for now the AI is already pretty challenging. If you guys want it to be harder then we wont want to disappoint.

-Hay-

Need: Each player tries to go for the hay and knock it out of the arena
We now have single way-point navigation, all we need to really do is turn off ability for aiming it towards a soccer goal. Sweet! Done and done…  until we tried it….
Image

They became experts in finding the hay, and knocking it out before you ever have a chance to get to it as well.  They were impossible to play against and it wasnt fun.

So we added Object avoidance Which i will explain later. not only are they trying to always hit the hay, but they are actively trying to avoid it as well. This nifty trick made them just easy enough to be playable. They don’t score every time they get the hay just like a real player would.

-Racing-

Need: Each player follows the track.
So adding onto the single way-point system we already have. we needed to add more to be able to allow the AI to follow a track. Awesome, easy enough. Just make the way-point a list of way-points and when it gets close to the current way-point make it go to the next one.
Image

So now we have an AI set up for our racing level but this is when we saw problems we were not expecting.

The problem we had with this approach was we have several areas that a vehicle can get knocked into (such as the quad jump and hills they can fall off of) and how does the AI react when this happens? Right now they will keep trying to get to the next way-point and get stuck on a wall. So how we battled this was to implement zones. This allowed the AI to validate that this is the zone it should be in, and if not to react by either going to a new way-point or follow the track where it dropped like a normal player would.

Bad behavior:
Image
Good behavior with Zones:
Image

AWESOME! it looks like we have a way-point system and a zone system that the AI follows and if knocked out of either system it can figure its way back. We now have pretty amazing race AI.

-Object Avoidance-

I believe this needed its own section because we have now integrated this behavior in each of our AI’s.
  • In every Level (especially racing) the trucks would pile up  leaning into each other as they go to a way-point, ball or hay. This caused the trucks to crash and slow down and didn’t look fluid. I could easily pass them when they are in this state. 
  • Early tests on the Soccer level you would see the vehicles stuck against a wall because they were trying to go through the wall to get to the other side of the ball.
  • Testing the hay level they would pile up more than hitting the hay as well as clipping edges of jumps and causing them to lose control all the time.

The answer to these issues is to make the cars see. we basically gave them eyes so that they can listen to things around them and react. This changed our game. This made the AI react in ways that were more what a player would. It felt they were more human when obstacles popped up.

 Image

we added ray’s in front of the vehicles  that listen to touching objects such as cars, sides of jumps, and walls. Left ray is touched turn to the right, right ray is touched turn to the left.

We could add the barral mine to the things to avoid but this gif wouldn’t have been as awesome.

It has been a lot of trial and error. Testing constantly with and without people. We think we have solid AI for each level and each having completely different behaviors. Are we done with the AI? nope! We do plan to constantly improve and balance each level to make them as fun as possible.

Thank you guys for reading and check back often for more awesome posts! If you guys want more information about anything else we have done please let us know!