Discrete component object recognition

Inspired by a recent paper in Nature, the goal is to develop the simplest system that can recognize numbers from the MNIST database

Similar projects worth following
MNIST number recognition is old, and boring. Wikipedia is filled with outstanding Error rates reached by complex ML algorithms and pre-processing trickery. Nevertheless, these are overkill, heavy, slow and... boring. They all rely on many moving parts, complex algorithms, expensive training and over-burdened execution.

This project takes a lean approach to object recognition inspired by a recent paper from Mennel et. al., published in the March 2020 issue of Nature.

How simple can we go?
* Is a low resolution camera simple enough? Can discrete pixel sensors recognize objects?
* What would be a good enough model? Would a a shallow neural network be sufficient? A SVM? What about a simple decision tree?
* Is an Arduino too powerful? Can an array of voltage comparators beat the latest FPGA? What about a 555?

In the end, time-constraints and the suitability and availability of components will determine how far we can go to reach the goal set on this project.

The goal

The goal of this project is to try and have a simple and machine vision system, trained to do one thing, do it as good as possible with the available resources and do it fast, really fast.

The idea came from a recent publication in Nature:, discussed in the Nature Podcast from 4th March 2020. This led me to wonder, how quick is quick and is there aything in between the system developed by Mennel and a typical camera + processor system?

The project will be divided into 4 parts as follows:

  1. Determine minimum requirements for sensor array
  2. Determine minimum requirements for ML model
  3. Determine minimum requirements for object recognition hardware
  4. Prototyping and final design

The final objective is to have a system that is as close as possible to state of the art ML algorithms but implemented on discrete components for maximum speed at an acceptable complexity level.


R file to reduce the dataset pixel density and evaluate a decision tree model accuracy based on the new images.

r - 2.16 kB - 03/29/2020 at 09:05



Loading the MNIST dataset onto RStudio

r - 1.44 kB - 03/29/2020 at 09:05


  • 1 × Arduino Uno R3
  • 1 × LDR 40k ohm
  • 1 × LDR 14k ohm
  • 1 × LDR 60k ohm
  • 3 × 10 k ohm resistors

  • Introduction

    fernando.eblagon4 days ago 0 comments


    The typical machine vision systems I'm familiar with need to get the camera sensor data coded onto a protocol, transferred from the camera to the microchip, then decoded, and finally fed onto a pre-trained model which is solved by an IC, from which we can get a result.

    The Nature paper describes an array of sensors which can be trained to recognize simple letters, i.e. it does the interpretation in the sensor array, forgoing most of the steps described above. The sensor array described in the paper is a complex setup well suited for a Research Institute but not so easy for a hacker to reproduce. At least for me to reproduce.

    This was for me ML on the very edge, literally. I though that maybe something similar could be obtained with a simpler setup, using off the shelf parts, but also forgoing some of the steps typically needed for a machine-vision system.

    My take on this was to try and make a prototype machine vision system that could identify images using discrete components by training a system based on decision trees. The sensors would be an array of photoresistors and the decision trees would be built using voltage comparators.

    The training would all be carried out on a PC and the resulting decision tree implemented on an array of voltage comparators. The activation of the voltage comparator would be trimmed by adding resistors with the optical sensors to make voltage dividers which in turn would trigger the selection process.

    There is a good possibility that this has been done in the past. If it has, I could not find anything similar.

    Very quickly I realised two important things by looking at the dataset:

       1. There would be some photoresistors' signal that would be ignored most of the time, i.e. my array would not be much of an array after all. It'd miss a few photoresistors.        2. Even if some photoresistors were missing, a simple array to analyse, e.g the MNIST number dataset would need an array of a few hundred photoresistors. I'm a father of 3 small kids with a full time job. This project quickly became impossible.

    Then something came to my mind, what if I could simplify the pictures by averaging pixels? Would a lower definition picture reduce the efficacy of the decision tree?

    The first objective was to develop some code to test this. All code was carried out in R using the RStudio IDE. Not extremely efficient but nice IDE I'm familiar with from previous forays on ML.

    The target accuracy, in order to be at the level of a commercial solution, should be above 75%.

    First decision tree

    The first decision tree was produced using the rpart package and the full definition 28x28 digits and trained with the first 10000 images of the MNIST dataset. Below is a sample of a plot showing a digit represented by a single record on the MNIST matrix with all 28 x 28 = 784 "pixels". ```{r} image(avpic(2,1)) ```

    After training the decision tree, the results were as follows:

    ## Decision tree: ```{r} rpart.plot(fitt, extra = 103, roundint=FALSE, box.palette="RdYlGn")


    The confusion table below shows that a lot of numbers get missclassified, but you only get what you pay for. If a  better solution is desired, a neural network or a random forest will serve you well but at the expense of this project never seeing the light of day.

    Confusion matrix

    The diagonal shows how often the numbers are correctly identified. All the results outside the diagonal are missclassified digits. Not looking so good. ```{r} table_mat ```

          0   1   2   3   4   5   6   7   8   9  0 349   5   2  10   8   7   0   4  31   0  1   1 346   8   7  34  19   0   8   6   7  2  48  22 215   4  12   6  15   6  42  25  3  23   6  33 239   6  35   2   7  47  21  4   1  10  13   7 277   8  11   6  12  40  5  47  10   5  28  32 155   5  10  54  14  6  36  24  31   0  63  12 186   4  32  11  7   1   3   4  16  34   4   0 326   9  22  8   6  23  27   7   9  13  13   1 241  43  9  10   9   3  46  28  12   2  52  18 205

    Accuracy calculator

    Ratio between the diagonal of the elements of the confusion matrix and the summation of the matrix and...

    Read more »

  • Minimum model

    fernando.eblagon4 days ago 0 comments


    The goal of this task is to evaluate the best model for a discrete component object recognition system.

    The first task would be to obtain a reference. Typical MNIST classifiers are well below 5% error rate ( We'll use a reference random forest, with no pre-processing on our reduced dataset.

    Benchmark model

    The dataset was reduced using the tools shown in and evaluated using the same metrics from the previous task.

    Training the Random Forest does tax the system quite significantly. The training of a 10k dataset using parallel processing consumed all 8 cores of my laptop at full blast for a good few minutes.

    The resulting random forest has the following characteristics:

    • 500 trees
    • Accuracy 83.6%

    Considering that:

    • The pixel density used was 4 x 4, instead of 28x28; and,
    • Only 10k records were used and there was no pre-processing on the data, the fit is OK.

    The fit for individual values was good and most digits were over 80%.

            0         1         2         3         4         5         6         7         8         9 
    0.8257426 0.8731284 0.9075908 0.8347743 0.8348018 0.8653367 0.9028459 0.8548549 0.7389061 0.7337058 

    Our aim is to make a model, based on decision trees, that could be implemented using discrete components that should match the accuracy of the random forest model.

    Decision tree models

    The choice of decision trees is due to the simplicity of this model with regards to its later implementation using discrete components.

    Each split on a decision tree would take into account a single pixel and evaluated using discrete components.

    By tuning the complexity parameters, we can increase the complexity of the tree in order to try and fit a decision tree that will approach the accuracy of the random forest.

    The table below shows the effect on accuracy of the cp parameter:

          cp      Accuracy
     0.0010000000 0.6921987
     0.0005000000 0.7162860
     0.0003333333 0.7295383
     0.0002500000 0.7375396
     0.0002000000 0.7389565
     0.0001666667 0.7412902
     0.0001428571 0.7423737
     0.0001250000 0.7422904
     0.0001111111 0.7422904
     0.0001000000 0.7411235

    The default cp parameter for the rpart package in R is 0.01 and with successive iterative reduction we obtain no visible increase in accuracy with cp below 0.00025 and we're still quite a way away from the target accuracy obtained with the random forest.

    Even settling for a cp of 0.0025, assuming there' a limit on what can be achieved with decision trees, the result is mindboggling.

    Could it be implemented using discrete components? Definitely. Maybe.


    Decision trees can achieve a reasonable accuracy at recognizing the MNIST dataset, at a complexity cost.

    The resulting tree can reach a 73% accuracy which is just shy of the 75% target we set out in this project.

  • Minimum sensor

    fernando.eblagon6 days ago 0 comments


    The goal of this task was to evaluate how simple a sensor array would be needed to reach object recognition on the MNIST database to a commercial level accuracy.

    According to this site, commercial OCR systems have an accuracy of 75% or higher on manuscripts. Maybe on numbers they do better but we'll keep this number as our benchmark.

    So there are two ways to test the minimum pixel density needed o identify the MNIST database with 75% accuracy:

    1. Grab a handful of sensors, test them against the same algorithm and benchmark them. Time consuming and beyond my budget and time allowance.

    2. Try and train a simple object recognition algorithm with a database that reduces in pixel density. Now, that's up my beach.

    The algorithm of choice was an easy sell. Since I'm focusing on the lowest denominator, decision trees it is.


    The database was obviously the MNIST database, the model was a standard decision tree with the standard parameters included with the rpart package in R.

    The database was loaded using the code from Kory Beckers GIT project page and the matrix containing the data was transformed by averaging neighboring cells as below. Code snippets can be found in the last section. Full code to be uploaded as files.

    This is the original 28 x 28 matrix image showing a zero.

    By applying a factor of 2, the matrix becomes a 14x14. Still quite OK.

    By applying a factor of 4, the matrix became a 7x7 matrix. We humans could have still told this used to be a zero, if you made an effort.

    Now what happens when the matrix is reduced by a factor of 7? Is the zero still recognizable? It's really a hard call. It may as well be a one at an angle, or a four. I chose to keep an open mind since I didn't really know how far the algorithm could see things that I couldn't.

    Finally, the last factor was 14, i.e. the initial 28x28 matrix was averaged to a 2x2. This would be a really poor sensor, but I needed to find the point at which the model couldn't tell the numbers apart and then start going up in pixel density.


    Once the matrix datasets were ready, it was time to see how the lower resolution pictures fared versus the full resolution database when pitched against the standard decision trees using a 10000 record training set.

    So, the images could be simplified and the number of pixels reduced by averaging them. The decision tree model had already shown a piss poor accuracy for starters. Less pixels might not affect it much. So let's see how they fared.

    As a reference, the full 28x28 matrix with the standard decision tree had the following results:

    Overall model accuracy (defined as the sum of the confusion matrix diagonal divided by the total number of tests)
    Individual digit accuracy (Obtained by dividing the number of times the number was correctly identified versus the number of times the model thought it recognized the number)
             0         1         2         3         4         5         6         7         8         9 
    0.8389423 0.7935780 0.5443038 0.5704057 0.7194805 0.4305556 0.4661654 0.7780430 0.6292428 0.5324675 

    That is to say, 63% of the times it got the number right. Some digits like 0, 1, 7 and 4 fared quite well, whereas the other ones didn't really make the cut. Let's keep an open mind nevertheless, a monkey pulling bananas hanging on strings would have gotten 10%. The model is doing something after all.

    14 x 14 dataset

    There was a clear improvement in classification for all digits with a marginal improvement for the overall accuracy, i.e. less pixels gave a better classification criteria.

    This is the equivalent of going out to the pub and starting to see clearer after a few pints. I'm pretty sure our brains have decision trees and not neural networks.

    Overall model accuracy
    Individual digit accuracy
            0         1         2         3         4         5         6         7         8         9 
    0.8689320 0.8491379 0.5525606 0.5892421 0.5493671 0.4624277 0.5859564 0.8186158 0.5795148 0.7304786 

    7 x 7 dataset


    Read more »

View all 3 project logs

Enjoy this project?



thegoldthing wrote 6 days ago point

Very interesting project. Have you considered using an optical mouse for your sensor? Something like the ADNS-2620 has an 18x18 pixel array that can be accessed via i2c. Only problem is you might need some optics or very small letters.

  Are you sure? yes | no

fernando.eblagon wrote 5 days ago point

That is a very good tip. Went through the TDS of the ADNS-2620 and it looks reasonably straightforward to implement. This led me to think of another camera that could be used which is the finger-print sensor cameras. These are communicating via UART instead of i2c.

There are two caveats from the point of view of this specific project when choosing to use a camera. First is the communication protocols, they all add time to the process and I want this to be not only simple but also blazingly fast. We're talking ns to ps response times for the voltage comparators and the slowest component, the sensors with 20ms to 30ms.

Second is the amount of information. The amount of information that is not needed in order to recognize an object is mind-boggling. Once you train a model, it'll tell you specifically what pixels to look at in order to tell objects apart. If you collect more information, it'll just be ignored.

There is no doubt that your suggestion would make a breeze of implementing this once the decision tree starts to grow. I'll keep it in my short list for the final implementation.

Thanks for the tip!

  Are you sure? yes | no

Dan Maloney wrote 6 days ago point

I really like the idea of minimalist ML. It's so simple now to throw as much horsepower at ML applications as possible, with Nano and Pi 4 and all that. Seeing it reduced to the minimum will be a real hoot. 

Personally, I'm pulling for the 555s...

  Are you sure? yes | no

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates