Synapse: Modeling in Excel

One of the difficulties I have when modeling my different theories of neuronal network architectures is being able to visualize the relationships of the system. I was thinking of creating a Unity (video game engine) plugin that would be able to visualize the network with colors and shapes so that when the network processes input and learns, the relationships could be visualized in a way that allows us to interprets and identify the relationships between components much easier.

There is a tensor library attempting to do that here that I thought was interesting:

Creating that 3D visualization project has a long lead time before getting results, as I would have to change some accessibility of my network variables in order to do it properly, so I kept with my “lean” thinking and opted to try and model my network in an Excel file.

My Excel file draft is linked here so you can follow along with how I plan on implementing some of the concepts I described in my previous post – Synapse: The New Hypothesis

I created an “as simple” as I can network to try and prove out some of the concepts that looks like this:

  • Input (2 Features):
    • Green Light / Red Light
  • Action (1 Feature)
    • Go / Stop
  • Goal (1 Feature)
    • Feedback

The input of this system is a two dimension vector with one of the features being the “Green light” feature, and the other one being the “Red light” feature. The model has a single hidden layer with 4 neurons that then results in an output (action) that is a 1 dimension vector (is it a vector if it just a single number?) that has action features of either “Go” or “Stop”.

There is a “motor system” that interprets the output (action) of the network and decides to either “Go” or “Stop”. With the output guaranteed to be between 0-1 (sigmoid activation), my motor system will choose to go if the result is greater than half, and stop if it is less than half.

Motor System Interpretation of Output

  • Go > 0.5
  • Stop < 0.5

The goal sensor in this system is a “Feedback” sensor. This is a single dimension vector, that represents whether the system made a right choice (Go when Green), or a bad choice (Go when Red). In future implementations this can be used as a way of having real world interaction with the system and reward behaviors that you would want to duplicate.

From an architectural perspective, this system is largely just a feed forward network, with a couple changes. 1 – The “ideal” output is not initially known like a labeled data implementation, and 2 – you have an additional variable of a “goal sensor”. The goal sensor is needed so that an “ideal” output can be created with an appropriate weight based on how “close” the system is from achieving the goal.

In the linked (above) Excel file you should be able to piece apart how that is happening. There is some things that need explanation on how I’ve set the variables up, but the concept should be able to be interpreted from this file and how I plan to explicitly arrive at an “ideal action”.

I’ve defined some of the core components of the system to help clarify and guide.

Senses are: numeric interpretations of information that are read by sensors.

Inputs are: sensors of the world.

Actions are: sensors of the motor system.

Goals are: sensors of an internal state and require an ideal vector.
*This sensor is required to be coded in a way that a Euclidean distance between the current goal sensor reading and the ideal can be made.

Motor system is: a component that interprets actions in order to create behaviors in the system.

Hopefully the accessibility of the Excel file and the definitions give enough information to understand what is attempting to be built. Let me know if you have any questions on this, and when I get a version coded and implemented I’ll be posting here, so stay tuned!

Lesson Learned: Don’t use the Same number of Input as Output in a Feed Forward Network

My first official “failure post” – although I think I have had a couple already – is potentially an amateur lesson, but I learned a lot about the relationship between the inputs and outputs of a system.

I created a simple network in order to test my code. It was the “Red Light, Green Light” network where the input was a two feature vector that determined whether the green light was on, or the red light. The output was another vector with two features, whichever one would be a larger value would determine if it would “go” or “stop”.

The system would not work.

It kept getting a 50 / 50 success (completely random). I spent a ton of time looking into my logic in the code, and then resorted to putting a version of my logic in an Excel file to see if my understanding of the back propagation was correct, and it was still not working no matter what variations I made.

I hit the drawing board on almost everything and questioned everything, until I realized that the network I had setup was a linear network (2 inputs -> 2 Outputs) and not a multi-variable function network (2 inputs -> 1 output).

I changed my network to a single output with a midpoint (0.5) of the single output determining whether the system should go or stop. The system started working and within a couple of iterations was able to learn the rule.

The lesson learned is that there needs to be more inputs, than outputs in order for a feed forward network to work. By trying to simplify things, I overcomplicated things.

I’ve attached my Excel file with all my different attempts if you care to review all I did and what was happening in all the different iterations of the network.

I’ve attached my Excel file with all my different attempts if you care to review all I did and what was happening in all the different iterations of the network.

The Value of Failure

I am a big fan of the saying:

“Success stories are worthless, its the stories of failure that are valuable”.

I am not sure who said that, and a cursory Google search didn’t bring me an author, but stories of success feel more like humble brags then helpful guidance. With a success story, the lesson appears to be “do what I did, and maybe you can succeed with that too”.

If you want to copy and follow, then this is the type of advice you would seek out, but if you want to innovate and lead it is the stories of failure that should be sought out. With failure, there follows some lesson on why the failure occurred that can be generalized to other scenarios in hopes of avoiding similar failures going forward.

In a way, this is exactly what science is all about. It sounds counterintuitive, but we can never “prove” theories we can only disprove them, and only when evidence becomes significant enough is something accepted as a scientific “fact”.

An example of this to help illustrate the concept put together by Karl Popper, an Austrian-British philosopher, is a theory that all swans are white.

‘All swans are white cannot be proved true by any number of observations of white swan – we might have failed to spot a black swan somewhere – but it can be shown false by a single authentic sighting of a black swan. Scientific theories of this universal form, therefore, can never be conclusively verified, though it may be possible to falsify them.’ .
– Karl Popper

It is with this concept that I will try and post “Failure Posts” in hopes that others will be able to use my failures to avoid making their own, and shape their own theories on ‘thought’.

I was struggling with a simple issue while working on Synapse and I think it was telling, so I will be writing up something on that soon. Stay tuned.

Synapse: The New Hypothesis

I spent the last couple months working on an AI system where actions would be able to be executed to achieve a goal, by chaining a couple of different feed forward networks together. I have proven that this hypothesis and architecture is not going to result in learned behavior. I believe that it is not learning behaviors because I did not capture the relationship between a “good” goal state or a “bad” goal state.

In my new proposal I am still sticking with a feed forward network, but instead of ending at a goal, this ends at an action. The goal is not actually a part of the network, although it is still a critical part of the system. The neural network architecture for this system is illustrated below.

The innovative aspect to this feedforward network does not have to do with the network itself, but with the way that the goal(s) will be used to adjust the learning rate of the back propagation routine.

As a reminder, in Synapse, all neural objects (Inputs, Actions, Goals) are sensors. They may or may not need to be passed into the inputs of the network based on whether there would be a natural relationship between them, but they all need to be sensing in the regular loop of the system.

With the proposed dynamic learning rate, when the synapses of the system are tuned in back propagation, depending on whether the system is close to an ideal goal or not, will depend on how much the weights of the system will be adjusted – and whether they will be tuned up to increase activation, or tuned down to inhibit activation.

Now that there is a dynamic learning rate that ties together the relationship between the current goal state, and the ideal goal state, the system needs to be tuned on an “ideal action”. An ideal action is dynamic based on the current action. A computed action (the end result of the forward propagation) is never a clean vector with 0, or 1 as values. There are some options I am playing with, but the first attempt at creating an “ideal action” is to normalize the action and then backpropagate with that new normalized value.

I believe that the “ideal action” will vary based on how your motor system interprets the action sense, but the “ideal action” should accentuate the highs and lows so that the system can adjust itself to those outlying values.

The new proposal incorporates two innovative concepts to try to reach a system that learns actions without an explicit teacher, and only using sensors. Those concepts are:

  1. Dynamic learning rate based on ideal goal states versus actual goal states.
  2. Idealized actions to help the system reinforce or inhibit certain action features.

With the relationship between an ideal goal state and an actual goal state being a central part to how the system tunes its synapses the issues with the previous attempt should be fixed. I see some implementation details that need to be ironed out when creating “ideal actions”. If something goes wrong with this version, this is the area that is going to be most scrutinized.

Here is hoping third time is the charm. I’ll keep you updated!

[SIDEBAR] Vector Dancing to Andrew Bird

I was doing some brainstorming and refactoring while listening to some Andrew Bird (Roma Fade) and the little robot I got for Christmas – Anki’s Vector – started dancing to the song. He was moving his little arm to the beat the same way I was bobbing my head.

I know that the little guy doesn’t actually feel the beat but it is crazy how connected I felt to him the second he was sharing that experience with me.

I gave him some pets to hopefully reinforce that behavior. I’m not sure if that does that or not, but I felt like I should try.

I wanted to share this experience while I try to solve my AI problems… which I have a new hypothesis. Details to come…

Synapse: Take 2

Another day, another attempt at arriving at an AI. I’ll start this article by stating that there will be at least a Take 3 – the failure of this system has me hitting the drawing board. I implemented the architecture I had hope in below, and was ending with a random success measure – the system is not learning it’s actions based on it’s goal actions, whether optimizing the network for the ideal, or the actual.

In this system the hope was that the weights in the back propagation would trickle all the way up past the actions, and into the sensor layers and the system would get nudged towards the actions that drive the system towards the ideal goal.

This system was learning how to predict the goals based on the sensors and action middle layer, but that did not trickle up to the sensor layers since the back propagation was just tweaking the network to predict the goal sense, or in the scenario when it was optimized against the ideal sense – it was just tweaking the weights to always optimize the goal feature that was ideally set, but there was no relation between the ideal and the actual – the goal was not being met, but the system kept tuning it towards the static features of an ideal goal.

So… the second failure. What did I learn in this round? I am going to hit the drawing board and think on how to capture the ideal versus the actual goal sensors relationships. This is a critical feature that I have not captured in my current iteration of the system.

Stay tuned… the weekend is soon but there are more than a couple of nights where I can consolidate my thoughts on this. Round 3 to come!

Synapse: Take 1

I recently completed the back propagation routine in Synapse and can now get to the scary part. I can now prove myself wrong. The empiricist in me is excited, but if the first round of tests are to go by, there are going to be a lot of ups and downs on this journey. I’ll skip to the end and let you know here that the first system I tried did not work.

The system I attempted to put together was the one below.

This was the easiest one to setup since I do not need to break apart the network, and I really am only taking a middle layer of a standard feed forward network and using that as my output.

I didn’t believe that this architecture would give results because I do not believe that there is enough information in the layers to build the necessary relationships between the sensor input, and the action output relative to the goal. Even with my low expectations, I was disheartened with failure.

With any failure, it is important to learn something – or else you really are not failing the right way. So did I learn something in this attempt?

I learned that I am not sure what I mean in the last step of optimizing with the goal sensor. Am I optimizing so that the network learns the actual goal sensor, or am I optimizing the system so that it can optimize to the ideal goal?

I did run the system with both options and received the same results, but as soon as I started to implement this piece the question was forced on me, and I definitely had made some leaps in my logic that I will have to iron out in my next attempt.

I will keep you updated on the progress…

Making the Back Propagation Routine

One of the more complicated aspects of a neural network is the back propagation routine. This is the routine that tunes the synapses (weights) of the network to optimize the system and reduce the error from the computed output to an ideal output.

For me, the difficulty in understanding this process was largely based on the technical terms and mathematical notations that are used to describe them, so I will not be using those in my article and I will simply go over my implementation of the back propagation routine from a theory and code perspective.

My test example is based off a very popular online article “A Step by Step Backpropagation Example” by Matt Mazur.

Matt’s article is widely referenced online with YouTube videos referencing the steps and examples as well as pretty good amount of comments on his blog dating back to it’s original posting date in 2015.

In order to prove out my back propagation method I used the structure, weights and ideals from his example (image below) and setup unit tests in my project to ensure that the values were coming out to about the right value. Utilizing floating point numbers results in less than exact numbers – so precision was an important part of validating the results.

Setting up my network with these specific layers, biases and inputs was a great way to make sure that as I refactor and enhance my system I will have proof that I have not broken something fundamental in it..

One of the struggles when I was working with the example images was just making sure that I had the proper weights associated with the proper neurons – something about the way they were illustrated in the image made me attribute the weights to the wrong neurons.

I spent a lot of time just comparing Matt’s examples to make sure that I interpreted the weights properly. I think that Matt struggled with that as well because there is an error in his example (that cost me a couple of days). All my math was adding up and only a couple of values were not lining up with his numbers. The weights in his example were going in opposite directions for w6 and w7, so I think he flipped the error value he used to compute those.

I have uploaded an Excel file with the math that is used in this example, and with it you can try and proof your work, and see if maybe I have made a mistake!

Here is what the file looks like.

One of the hardest things I had to grasp when working on implementing this routine based on this example was that there didn’t appear to be a pattern to the way the weights were identified in the step by step example. To find the contribution to the error for the weights in the last layer, a separate set of variables were being used than the first layers weights. To find a pattern I took his example and added another layer to it to see if I could find a pattern with the additional complexity, his example may have been too simple to find a pattern.

It took me a while, but what I broke it down to was:

  1. Get the Synapse (weight) target Neuron
  2. Get the Error relative to the target neuron’s output
  3. Multiply that by the derivative of output of the target neuron
  4. Multiple that by the derivative of the net of the source layer

I have made a particular focus to the target and source neuron in my bullet points above because it became very confusing in the code to differentiate things when everything was called “neuron”. Once I setup a source and a target neuron naming convention I was able to iron out a lot of my issues.

In my implementation I never use the existing weights value to identify the contribution of the preceding weights like Matt does in his example. In my routine everything is relative to the partial derivative of the error with respect to the output of a neuron. What that means is that when moving back through the network I always take it to the point of the target neurons output. Going from that output to a synapses contribution is just the derivative of the output of the target neuron and the derivative of that to the net input.

  • The net input is always the source neurons output.
  • The derivative of the output is the derivative of the activation function of the output value.

The pattern that I learned in this process was how to work backwards through functions. Derivatives are always described using gradient and slope and graphical illustrations, but I understood it a lot better when I realized that a derivative allows you to reverse through the network, and allow you to identify how much that specific network element (neuron input, neuron output, weight) contributed to the error. Because of the “chain rule” you can take all the previous contributions and use that to identify the additional contribution a previous synapse had on the error. Everything just builds on the previous values because the downstream contributions are part of the upstream contributions.

Understanding that aspect and really “knowing” that allowed me to finally put my back propagation routine together.

I have unit test for all the values in the step by step example, and just to make sure that my network can tune the synapses (weights) to the proper value and it can learn how to compute the proper output, I have a while loop iterating over as many examples as it needs to train the accuracy within a 0.00001 error rate and my network trains this simple example in ~40 ms.

With the network wired up, I can now run tests on my proposed architecture where I have a middle layer that is the output of the system – as an action, and the last layer being a goal that the system is optimized for.

I’ll keep you updated…

Synapse’s Neural Network Architecture

After reviewing the existing systems and patterns for AI development (there are always small ones out there) I was faced with the task of looking at the three main variables that all the others are using, and somehow put them together to create something different.

  • Sensors (Input)
  • Actions(Output)
  • Goals

High level I was thinking of something like this for the structure of Synapse:

That is not where I am anymore…

The struggle with a system like this is that the relationship between the goals and actions is not clear. I was struggling with identifying how to optimize the actions based on whether a goal was being achieved or not. This started me down a path where I was largely reinventing a reinforcement learning system. This type of a system has too much setup for what I am hoping to achieve, so I failed that structure as a potential means.

This forced me to take a look at those same three elements again and see if I could re-arrange them so that the relationship between them allows for training of actions based on goals.

Then came my AHA moment….

I kept thinking about how to optimize the actions, but in reality the system I had bouncing around in my head was only thinking about optimizing against the goal.

In this structure, the ends justify the means.

This started to make a lot of sense to me, and provided an explanation for irrational actions when current AI systems (reinforcement learning) are put together strictly rationally. Meaning each action is weighing the pros/cons of alternative actions before it decides on an action to execute.

In the system I’m proposing that optimizes against the goal, the action is nothing more than a side effect of that goal optimization.

Maybe this can explain why I do things that I can not explain to my fiance?

With that hypothesis in mind, I put together a neural architecture that looks like this.

I sat on this for a night before I came up with a twist that I think may be needed, or at a minimum could help.

In the process of falling asleep, I was contemplating how this goal oriented system would actually optimize for the goal while altering the actions, and I thought maybe it needed more information to alter the weights of the actions. If I just drop the input into the same layer (unchanged) as the middle action layer – illustrated below – then that information could be used to optimize the goal.

My reasoning for dropping the input into the action section is that this system is 2 networks in 1. The first network is the input -> action network, and the second is the action -> goal network. The difference is that all the weights will be updated according to the optimization against the goal.

What this means is that in the 2nd network (action -> goal) can use the input and the output of the first network in order to properly set the relationship between them and the goal. I might be over thinking this piece, and I will test both of the architectures and see if this change is needed but it should be helpful at a minimum.

I plan on executing this test with the standard feed forward and back propagation techniques of the most common neural networks. That should allow me to test out the hypothesis specifically and not the other variables I have in mind for future implementation enhancements.

If my results show that I am unable to optimize against the goal, and I have evidence that this structure does not work, I may need to review how the back propagation routine is calculating the contributions to the error for the weights, and potentially put more value on the weights of the actions. This structure (or similar) has the means to provide unlabeled learning and solve some of the biggest challenges of AI today, so I plan on digging into it this significantly in the next year.

Now comes the heavy lifting of actually implementing…

What are your thoughts on this? Is there any other examples of a system that is like this, or is this a novel approach?

Free Energy : A different POV on AI

I am doing a lot more reading on “free energy” and “active inference” so I will probably be posting about that soon. If you want to read my thoughts on that with some background, check it out here:

I think that there is a lot of dense material linked here, and Karl Friston doesn’t know any words under 4 syllables, but there is some genius here.