Since my pivot from building an AI for my racing game, and starting with the “hello world” AI training data (MNIST) I have started to get into the first step of what anyone who would use my platform would do – Create the Sensors, Actions & Goals.
In my commute to work I have been thinking about the many different ways to try to encode visual information, and will be looking forward to trying different types out, but I think that the most straightforward sensor is one that will interpret the grayscale 28×28 pixel image as a 784 feature vector (28×28=784). Each one of the pixels will be represented by another feature of the vector.
The first 28 features will be the first row of pixels in the image, and the 2nd 28 features will be the 2nd row of features. The number in each one of those features will be a value from 0-1, with 0 being a white pixel and 1 being a black pixel. A gray pixel would be 0.5.
An MNIST image broken down into each pixel. In this example, the first
5 rows are all 0 value, as the 4 is not visible until the 5th row.
The action (output of the network) will be a representation of a number (0,1,2…-9). This means that there are 10 different potential results. The simplest and most straight forward way of encoding this information is in a 10 feature vector. The first feature will represent 0, and the last feature will represent 9.
An example of an action which has 4 as the most
likely number, and 9 as the 2nd likely.
This means that the structure of this network will be a 784 feature vector, that goes through 2 hidden layers (default for Synapse – reasoned from Jeff Heaton findings) and results in a 10 feature vector.
The “motor system” will interpret this 10 feature vector and will print the number to the screen.
(I think I might need to add a diagram of my vision of the standard architecture at some point. Would that be helpful? If you want it let me know.)
[DISCLAIMER]Using “actions” to describe the output of Synapse in this supervised instance is not ideal, but since the goal of Synapse is to be embodied (in a virtual race car initially) I will continue with this naming scheme, just be aware that the output of Synapse is an Action, and in this MNIST implementation, the Action is to interpret the hand written image.
The goal of this network is to determine the correct number that is written in the image. With every goal in synapse, it is required to have a sensor. My rational is that ,just like any goal in life, you can never know when you achieved a goal if you never measure it.
I will need to take the label for the data that lets the system know what number is written in the image and encode it into a sensor. This goal sensor will then be used to optimize the network against when the system trains, or as I term it in Synapse “sleeps”.
The goal sensor will be the exact same encoding pattern as the action is. That is how the system will determine whether the goal was achieved for the image.
The first image is the goal sense, the 2nd is the action. In this example
the system properly interpreted a 4 as the most likely number.
I think that this setup will be a good initial test of the system. Would you have done it differently?