Supervised vs Unsupervised vs Reinforcement Learning

So I am looking to implement my own pattern of AI, I am calling Synapse. 

With that audacious goal, I figured I would take some time and spell out how I see these existing patterns and systems and the different implementations versus the planned implementation of Synapse.

Supervised Learning

Supervised learning is the most straightforward and common of the implementations of “AI”. I am not sure of it’s history and who had the first version of this (if you know, please let me know!) form of machine learning/AI but it is very common.

This AI pattern requires “labeled data” or the answer key to what is represented in the distributed processing network (or neural net, or whatever you want to call it). This method learns how to process the image/sound/data through training sessions and after it has trained enough it can start processing new data. With newer Supervised learning models they have been able to train the system to perform  better than human at tasks like categorization (hot dog / not hot dog)  or prediction. Even humans are not perfect at these tasks, so we are now able to train these networks to be better than the average human (scary?).

Unsupervised Learning

Unsupervised learning is related to Supervised Learning as the system is trained on a network of neurons, with the difference of functionality being that the this type of a system do not need “labeled data” to learn the relationships between the data, it just learns the relationships between data. What this means is that it can group the data into “like” items categorizing them.

The types of things that an unsupervised system is generally used for is categorization or “clustering”. It can be used to answer the hot dog /not hot dog problem without data that is specifically labeled as hot dogs or not. The system learns the patterns in the images of food and then uses those patterns to group the images. 

I have not found an implementation of a business solution that is 100% unsupervised. It seems like this system is mostly used for exploring data and finding patterns that weren’t directly evident or easily described. After identifying these features, they can be used in conjunction with a supervised learning system.

Reinforcement Learning

I am relatively new to my understanding of Reinforcement learning and when I was reading about it I got really excited because it seemed to marry up a lot with the thoughts I had about an AI system. It had actions, and goals and separation of environment.

I was quickly let down.

The terms that are used were on point and I am using a lot of the same terms in my system (action, reward(goal), environment, but the amount of structure that is required to be setup for this system is significant.

In a reinforcement learning system, you have to map out the environment and outline the rules of interaction within it. With those rules established, the system uses a bit of a brute force approach to solving what action an “agent” should take to maximize the rewards in the environment. There is a lot of setup in this system, with configurable constants required to determine how much future rewards should be valued over current rewards. (A bird in hand is worth two in the bush.)

My big problem with these…

The biggest problem I have with these systems is the amount of setup and configuration they all require. When I think of an AGI system, I imagine starting with the smallest elements of input and with a properly structured system, the items that are configurable in these systems would be emerging features of the structure of the systems.

An example of this would be the brains working memory. We know that working memory is 7 bits plus or minus 2 (5-9). There is no evidence that there is something specific in the brain with 7 spots open in it for the 7 things you can keep in mind at time, and what it most likely is, is a side effect of the way the brain processes data and not a specific “thing” in the brain.

Synapse will be different why?

With Synapse, I am taking an approach of defining the things that I believe exist in the brain, and have some of the features of some systems become emergent properties of the system.

Synapse has neurons, connections between them architected in a way to fulfill the goal. The trick is structuring or architecting the same set of objects that have been in use in AI systems for years, since those are all the things that you can look at a brain directly and see. There is no ambiguous concept in the structure of the system. As things start coming together, I am hoping to see that those amorphous concepts start to be represented in the processing of the system, and not as a specific configuration of the system.

I have a lot of ambition for this project and am using these posts to think through the things about existing systems that I find useful, as well as things that I see as shortcuts in the process of arriving at an intelligent digital “agent”. If you think the same way as me, or not, post a comment and we can have a conversation.

Compromising on an AGI Definition

I was on my way to Northwest Arkansas for work and had some time on the plane so I started working through some of the struggles I’ve had regarding making an AI platform. Mainly the struggles in not breaking the definition that I put together originally. I’ve had a couple of quick pivots recently, and that started to get me thinking.

With a couple of hours to burn, I had the time to focus and I had a pretty big breakthrough on some of the architectural details in the neural network implementation of the system. It’s a network setup that allows me to keep my system general without abandoning the original definition. Before this breakthrough, one of the strategies that I was toying with in my search for answers, was to “reframe” the requirement or in this case redefine AGI, in order to make something “work”.

I put that in quotes because it is important to realize that by making it “work” I was actually admitting failure to the definition. I was being pressured into solving a problem that there was no clear answer to. Not proud of this, but when the going got tough, I was looking for a solution that was “good enough” and got me “pretty close” to a system that functioned the way I envisioned and originally defined. In many cases “good enough” works good enough in a business scenario. In these situation we do ourselves a disservice by not fully respecting the requirement and honoring the original vision and pushing ourselves to the best solution.

If we treat requirements as pass or fail and not a “close enough” compromise on them, it allows us to push ourselves past struggles and arrive at innovative solutions. If we accept a “good enough” solution we are giving ourselves the easy way out and not fully testing our abilities.

In a scenario where you believe that you have reviewed all possible options and can not fulfill the requirements, pause for a minute and back away from the problem to try and get a new point of view. Review whether the requirements are impossible by definition (a circle can not be a square) – or if there are is a missing piece of the puzzle that is still to be identified. 

I experienced this recently at my day job where a developer was struggling with an equation for handling feed consumption and deliveries for chickens, while at the same time handling logistical needs of the trucks needed to deliver the feed. A lot of variables to try and juggle at once.

The requirement was to allow an operator to set the delivery date manually, instead of the automated delivery dates that the system is currently producing. This is tricky because there are a lot of variables that need to be calculated to make sure that the birds do not starve. If a delivery is needed  prior to the set delivery day because the birds are eating much more that anticipated the system needs to accommodate that.

One of the expectations of this system is that after setting a date – and not changing any other variables – the number of deliveries and the amounts to be delivered would not change over the life of the flock. The developer was struggling, because these values were changing – not significantly – but enough to be noticeable.

This developer was frustrated and trying many different complicated ways of try to add things up properly, and was convinced that there was no way that the numbers would ever be able to stay the same because there was too many variables. To me, no matter the number of variables that are involved, the only variable that changed was setting the automated date to manual – so effectively this was a single variable change. Something was off, and the implementation was wrong.

After focused thought and a review of the potential solutions the developer was working on, he and I arrived at a change in a smallpiece of the code – one line being moved out of a loop and into another – and was able to fix dates for feed deliveries without changing any of the other fields.

This chicken story has parallels in the world of AI. We are still compromising on the solutions we are making because we are not having the right conversation and narrowing down what are valid variables and which variables are not variables at all, and we have only added them because of the compromise on the definition. With the three major types of AI architectures out there: Supervised, Unsupervised and Reinforcement learning, compromises are clear. To be fair, many of these systems were not and are not designed to be AGI’s, but with some discussing AlphaGo (an advanced reinforcement network) as the next coming of an AGI system, I have to put a critical eye to it.

Supervised Learning

  • Compromise on Data
    • The answer has to be given, which results in a chicken and egg scenario. If all knowledge needs a supervisor, then who was the original supervisor?

Unsupervised Learning

  •  Compromise on Goals
    • This system lacks a definition for a flexible set of goals, it’s architecture is designed for classification and does not generalize well to many other tasks.

Reinforcement Learning

  • Compromise of Environment
    • This system requires manual creation of the rules of the environment. AlphaGo handles a VERY complex system, but it is still nothing close to as complex as the real world.

I will continue to focus on the definition of AI and not add manual interventions in the setup of the environment, the testing of whether the system is providing a correct answer, or a failure to broaden it’s generality to other actions. These challenges will need to be addressed in order for AGI to be realized.

This, less is more approach to AGI your work will be better served in limiting it’s variables. Parsimony is critical. The simplest solution is always the most right solution. If you are working on a system where you are creating more setup and configuration you are starting to go down the wrong road. To put it brashly, an AGI should be “born” and then it should evolve from there.

Thoughts? Have you ever had this type of a scenario where you just put the headphones on and focus intently for a couple of hours walking away with a completely different point of view on a problem?

Pivot on a Pivot… Creating an automated Supervisor for labeled Data

Last time you heard from me I had to pivot due to the some technical limitations of the Unity3D engine and my choice for neural processor (Encog).

Today I pivot again. The difference between my original plan and this new one is significant, as my original plan was an unsupervised network, and this new approach is a supervised network.

I thought I could make it work.. and I could.. technically – even if it is a bit of a shoe horn, but I thought of something that fits the model a lot better: an automated supervisor – or Robovisor. I will build a literal supervisor that will provide feedback to the network based on whether the Robovisor likes the action of the system- or in this implementation if the number that the system returns is the actual number. If it does, then it will be given positive feedback, and if it is wrong, it will receive negative feedback. The goal of the system is to continue to have positive feedback.

I thought of this feedback sensor previously as I was brainstorming ways to coach an embodied system with Synapse. I imagined that if I liked the way that the truck took the turn in the game, I would have an app that I could “thumbs up” the truck providing it positive feedback. I am going to expand this concept and instead of me providing feedback for the truck, I will build a supervisor that will know the number written in the MNIST image and provide positive or negative feedback based on the action of the system.

I’m not sure if this will work or not, but this is a lot more interesting to test than the standard labeled dataset supervised learning examples that exist everywhere. I think that this is a novel approach and am looking forward to how it works out!

New layout for MNIST Implementation


  • Number Sensor (0-9) [This is for the Supervisor]
  • Image Sensor (28×28 pixel images)
  • Feedback Sensor (-1, 0. 1)


  • Interpreted Number (0-9)


  • Positive Feedback (Goal of 1 = Correct Answer)

Do you think that this new approach will work? Have you seen something like this implemented before?

Creating the MNIST Sensor, Actions & Goals

Since my pivot from building an AI for my racing game, and starting with the “hello world” AI training data (MNIST) I have started to get into the first step of what anyone who would use my platform would do – Create the Sensors, Actions & Goals.


In my commute to work I have been thinking about the many different ways to try to encode visual information, and will be looking forward to trying different types out, but I think that the most straightforward sensor is one that will interpret the grayscale 28×28 pixel image as a 784 feature vector (28×28=784). Each one of the pixels will be represented by another feature of the vector.

The first 28 features will be the first row of pixels in the image, and the 2nd 28 features will be the 2nd row of features. The number in each one of those features will be a value from 0-1, with 0 being a white pixel and 1 being a black pixel. A gray pixel would be 0.5.

MNISTExampleAn MNIST image broken down into each pixel. In this example, the first
5 rows are all 0 value, as the 4 is not visible until the 5th row.


The action (output of the network) will be a representation of a number (0,1,2…-9). This means that there are 10 different potential results. The simplest and most straight forward way of encoding this information is in a 10 feature vector. The first feature will represent 0, and the last feature will represent 9.

MNISTActionAn example of an action which has 4 as the most
likely number, and 9 as the 2nd likely.

This means that the structure of this network will be a  784 feature vector, that goes through 2 hidden layers (default for Synapse – reasoned from Jeff Heaton findings) and results in a 10 feature vector.

The “motor system” will interpret this 10 feature vector and will print the number to the screen.

(I think I might need to add a diagram of my vision of the standard architecture at some point. Would that be helpful? If you want it let me know.)

[DISCLAIMER]Using “actions” to describe the output of Synapse in this supervised instance is not ideal, but since the goal of Synapse is to be embodied (in a virtual race car initially) I will continue with this naming scheme, just be aware that the output of Synapse is an Action, and in this MNIST implementation, the Action is to interpret the hand written image.


The goal of this network is to determine the correct number that is written in the image. With every goal in synapse, it is required to have a sensor. My rational is that ,just like any goal in life, you can never know when you achieved a goal if you never measure it.

I will need to take the label for the data that lets the system know what number is written in the image and encode it into a sensor. This goal sensor will then be used to optimize the network against when the system trains, or as I term it in Synapse “sleeps”.

The goal sensor will be the exact same encoding pattern as the action is. That is how the system  will determine whether the goal was achieved for the image.


The first image is the goal sense, the 2nd is the action. In this example
the system properly interpreted a 4 as the most likely number.

I think that this setup will be a good initial test of the system. Would you have done it differently?

Adding Supervised Learning to Synapse

I started my AI platform (Synapse) with the understanding that I wanted to make an AI that paralleled some human constructs, as the field of AI has too many of it’s own terms that makes learning AI more complicated than it needs to be.

This meant that Synapse would be an unsupervised system. The difference between a supervised, and unsupervised system is labeled data, versus unstructured data. An example of labeled data for supervised learning would be CAPTCHA tests you have to pass to make sure you are not a robot when logging into your favorite service. By selecting the images that all have bikes in them, you are helping label image data. Without the label, the system can not learn what a bike is. Labeled data provides you the correct answer, while unsupervised learning doesn’t have any right or wrong – it just learns the features of the data being processed. [What’s the difference between supervised and unsupervised?]

I made a definition for AI only slightly modified from a definition, and made a statement that all AI’s could be defined with it – so does that definition apply to unsupervised AI alone, or supervised as well?

“Artificial intelligence is an entity, able to receive and process inputs from the environment in order to flexibly act upon the environment in a way that will achieve goals over time.”

With this definition it is important to identify the new environment (MNIST data) and determine what each of the pieces are.

  • Environment:
    • MNIST Dataset
  • Inputs:
    • 28×28 pixel images
  • Actions:
    • Print a Number (1, 2, 3…9)
  • Goals:
    • Accurately Interpret the Numbers

The trick for a supervised system in Synapse, is that in the unsupervised implementation, the goal is usually an ideal sensor state. For example; in my Super Truckin’ AI, the vehicle’s speed has a sensor, and there isa goal represented in the system of maximum speed. The system (theoretically at this point) would identify the relation between hitting the gas, and getting closer to the goal of top speed, and learn to act by hitting the gas.

In a supervised system, if I were to provide the actual number as the input goal in addition to the pixels of the image of the number, then it would learn to ignore image pixels, and would just repeat the goal as the number, which is essentially useless. That setup is like taking a Jeopardy test where you need to answer with a question when you are given the question already.

Since the goal is explicit in a supervised system, the system needs to optimize the output (action) with a dynamic goal (each image is a different number), and not a static goal (go fast), since the goal is different and explicit for each set of numbers.

This implementation of MNIST is not bringing anything new to the world of AI, but I plan on using the MNIST dataset as a test of my neural network. I’ll start with version 1 of the network using some basic parameters and as the system evolves, I can use this data as a benchmark of progress.

I’ll let you know after I implement the “pivot” in the system and add supervised learning  if I have to revise the AI definition or not, but I think I have re-framed the problem in a way that solves it for a supervised implementation even if it breaks some of my architectural constructs.

Do you disagree? I hope so, because then one of us is going to learn something…

Pivot… Hello World Synapse, with MNIST

I started working on Synapse (AI platform) with the goal of creating an AI that would drive the vehicles in my racing game Super Truckin’. Unfortunately, after attempting to get Super Truckin’ up and running I have determined I will have to “pivot”.

When I updated Super Truckin’ to the new Unity 3d build, I was forced to update the Edy’s Vehicle Physics engine and broke everything. There was no “automatic update” of the changes and there was significant architectural changes in the physics.

Another challenge faced was that the libraries I was going to use for the feed forward and back propagation methods were in Encog (C#) and that library is completely incompatible with Unity, as Unity only supports .NET 2.0 and Encog is .NET 3.6.[Encog]

This doesn’t change my goal as I was still planning on making my own feed forward , and back propagation methods, but it does put it off for a while until I can build the neural network pieces of my platform.

With these bumps in the road, and my patience to see some results short, I have now chosen to make my first complete project with Synapse one that is based on the MNIST data set. A “hello world” set of data for AI developers. For those not familiar, MNIST is a set of hand drawn numbers (28×28 pixels). [MNIST Data]

This does change my platforms first implementation from an unsupervised network, to a supervised network since MNIST is labeled, meaning the system isn’t going to “learn” intuitively what the number is, it will have to be told after each attempt whether the number the system believes it is, is the correct number. This is a drastic change, but one that does test whether my original definition of an AI is still valid, or if it was only valid for unsupervised AI’s.[Original AI Definition]

Is it going to be easy to switch a system from unsupervised to supervised learning? We will find out…


Sensor Types of Synapse (my AI platform)

I am building an AI platform with 3 main components: Sensors (inputs), Goals and actions (output).

The initial version of this system is being built to try and make an automated vehicle AI for my racing game I made about 5 years ago. The original AI in that game was built with waypoints and had a lot of setup and tweaking required in order for the vehicles to make their way across the course, and did not have any “learning”, so it is an AI in only the videogame definition, not the machine learning definition.

I am planning on adding 4 different sensors to the vehicle.

  1. Location Sensor
    1. This reads the location on the course, like a GPS sensor
  2. Proximity Sensor
    1. This reads if something is within range of the vehicle
  3. Rank Sensor
    1. This reads the current race position (last place, 2nd place…)
  4. Speed Sensor
    1. This reads the speedometer

These are more than enough input to the system in order for it to optimize it’s actions and learn how to race along the course.

As I started coding these, I realized that I was going to encode the sensor readings to vector inputs in two distinct ways.

  1. Metered
  2. Discrete

Metered Sensor
The speed sensor is a good example of a metered sensor. In a metered sensor, the larger the reading, the more features of the input are activated. The slower, the less number of features of the sensor input are activated. If the maximum speed for a vehicle is 100mph, and I have 20 feature dimensions on my sensor, Then at 50mph, the sensor would have the first 10 features activated on the sensor reading.


Discrete Sensor
The location sensor is a good example of a discrete sensor. There should be no relationship between the location of the vehicle on the course and the number of features activated, so it would make little sense to activate more features if the vehicle is at the top of the course versus the bottom of the course. In this sensor type, each feature represents a specific dimension across the X, Y, Z dimension.


Theoretically all sensors could be Metered, and the system could learn the discrete elements of the sensor reading. I am going to shortcut this but maybe this is worth testing after the system is up and running and I have the second phase of the system implemented, which will create more complex internal representations and “learn”.

I can’t think of any other sensor encoding types that one could build that would interpret raw sensor readings as vectors. So until the need is identified, these will be the base number sensors.

UPDATE: I did think of another way of coding a number. It could just be a single feature dimension. Pretty straightforward actually… I might have over complicated things.  🙂

Do you disagree? I hope you do because then one of us will learn something…


Definition and Elements of Synapse (my AI platform)


There is a really good definition of AI that has been put together by

“Artificial intelligence is an entity (or collective set of cooperative entities), able to receive inputs from the environment, interpret and learn from such inputs, and exhibit related and flexible behaviors and actions that help the entity achieve a particular goal or objective over a period of time.”

The reason I think it is so good, is that it has the three elements that I have identified in what I am building my AI platform and all other platforms, whether they are aware of it or not. Inputs (sensors), actions (output) and goals (objectives)- and what a system made up of these things does, is process them. This definition is the best one I have found, but if I were to criticize it, I think it is a bit too verbose. I would change it in this way.

“Artificial intelligence is an entity, able to receive and process inputs from the environment in order to flexibly act upon the environment in a way that will achieve goals over time.”

I believe that inputs, actions and goals are THINGS (objects) in an AI, and processing is the thing AI does. I think that this definition is a little bit more direct than the definition and if I were talking to someone writing an object oriented solution it would be a bit more specific for them.

It is important to differentiate in an AI system that the action in an AI is an object or THING, not a verb or actual action. An AI system only processes, and the outputs of the AI system are the actions that are represented in an object. This action if embodied would then be interpreted by a motor system.

An oversimplification of an AI workflow would be something like the one below.


The Actions piece of the this workflow would then be ingested by a motor system to execute in the environment, but the instruction for the action is the output of the AI system.

Do you disagree? I hope so, because then one of us is going to learn something…

Changing the format…

This WordPress was put together initially to promote and demonstrate the progress of a project I was working on with my brother and friend. That project is scrapped, but the name and the site are too good to just let sit.

I am going to start to post my work on AI – so instead of “Developments on Development” it’s going to be “Thoughts on Thoughts”.

There will be some well thought out articles, and then there will be random musings and thought experiments. I’ve been working on my AI project largely by myself at home and there is a big risk of getting tunnel vision, so I am going to take advantage of Cunningham’s law and start putting my wrong answers up here and hope someone corrects me.

Enjoy! Hopefully we both learn something in this process.


IndieCade Feedback on Demo

We submitted a demo of the game to IndieCade for an upcoming IndieCade Festival and unfortunately didn’t make it. 😦

All is not lost though! We received some really great feedback and have since released a new demo taking the feedback into consideration. We thought we would share the feedback in it’s raw form and then what we did to try and respond to that feedback.

Feedback 1
The concept is solid as a multiplayer game, and a stands out a little (while there are plenty of chaotically fun local mutilplayer game, car-physics soccer is a least a little different, and cool). However the controls, specifically the car physics, while intended to be awkward, are effectively unusable w/intention – they accelerate so slowly, have such a high speed, and slow turns speed, but also lose control at the slightest collision, so scoring feels random because it often occurs by accident and not through player directed action. Making the game feel more like work, trying to swim upstream to control your car. It also feels like perhaps some deviation from basic soccer rules could be used in combination with the awkward physics to improve that, but other than the power ups that is an under explored area. Having a large number of local players through methods like using smartphones is a great idea, although the visuals also make it very very hard to distinguish your car (even with the show car marker functionality, which didn’t seem to work consistently).

Feedback 2
Super Truckin’ is a fun idea but I think it needs a little more work to get it into a more solid space. The controls feel a little too loose, with vehicles a little slow to respond, they feel sluggish and heavy with very wide turns. A reverse function would be very nice. The AI when we played two players with two AIs seems to be scoring against their own teams.  The music is fun but gets pretty repetitive. With sharper controls I think this game could be a lot of fun, it would feel like you had more direct control and thus could actually impact what was happening, as it is it feels like it’s more luck than skill if the ball goes into the goal.

Feedback 3
The core concept of the game is solid if slightly straightforward but there were problems in the execution. We were told that it could use mobile devices as an input but this didn’t add much as we chose to play on controllers for better response time and accuracy. My biggest problem in game was how difficult it was for players to exert any kind of meaningful agency on the game especially in matches with larger team sizes. The players car was very difficult to control and often overshot its target or was hit by an unavoidable collision. I think you could learn a lot from Fifa and the upcoming rocket league in terms of adding responsiveness and exciting meaningful options to the player’s controls.

The items we decided to tackle in this new release are below. We do plan on tackling more items in the next demo but these were the ones we felt were common amongst the three and were quickest to handle.

Items Addressed

  1. Improved speed and maneuverability
    1. This was something we had grown accustom to as we played the game, but as soon as we got the player feedback we went back and revisited the fundamentals and new we had to address it. The cars now accelerate much ffaster and turn much quicker. It may not be perfect yet but it is much better. Check it out and let us know what other tweaks you might want.
  2. More Predictable Ball Handling
    1. This was already in progress by the time we received the IndieCade feedback but it was good to know we were working on something that others noticed. Previously the cars had a top and bottom collider, but after playing Rocket League and realizing that the ball didn’t always collide directly with the car – we decided to switch our colliders from two box colliders to a capsule collider. This little change did the trick and the ball now gets hit and goes in the direction you were trying. Much more rewarding experience.
We are open to suggestions and feedback with this new release but we plan on addressing a couple more items still. We will continue to focus on the fundamental car handling and making sure that is on target and we will be focusing on getting more polished content.
Try the demo by downloading it here:
If you want to contact us directly check out our Facebook, Twitter or other contact methods like old fashioned email!