If this is the first time you're reading my blog, you may have missed that I'm on a journey to push the limits of Machine learning on embedded devices like the Arduino boards and ESP32.

I started with accelerometer data classification, then did Wifi indoor positioning as a proof of concept.

In the last weeks, though, I undertook a more difficult path that is image classification.

Image classification is where Convolutional Neural Networks really shine, but I'm here to question this settlement and demostrate that it is possible to come up with much lighter alternatives.

In this post we continue with the examples, replicating a "benchmark" dataset in Machine learning: the handwritten digits classification.

If you are curious about a specific image classification task you would like to see implemented, let me know in the comments: I'm always open to new ideas

The task

The objective of this example is to be able to tell what an handwritten digit is, taking as input a photo from the ESP32 camera.

In particular, we have 3 handwritten numbers and the task of our model will be to distinguish which image is what number.

Handwritten digits example

Handwritten digits example

I only have a single image per digit, but you're free to draw as many samples as you like: it should help improve the performance of you're classifier.

1. Feature extraction

When dealing with images, if you use a CNN this step is often overlooked: CNNs are made on purpose to handle raw pixel values, so you just throw the image in and it is handled properly.

When using other types of classifiers, it could help add a bit of feature engineering to help the classifier doing its job and achieve high accuracy.

But not this time.

I wanted to be as "light" as possible in this demo, so I only took a couple steps during the feature acquisition:

  1. use a grayscale image
  2. downsample to a manageable size
  3. convert it to black/white with a threshold

I would hardly call this feature engineering.

This is an example of the result of this pipeline.

Handwritten digit feature extraction

Handwritten digit feature extraction

The code for this pipeline is really simple and is almost the same from the example on motion detection.

#include <esp_camera.h>

#define PWDN_GPIO_NUM     -1
#define RESET_GPIO_NUM    15
#define XCLK_GPIO_NUM     27
#define SIOD_GPIO_NUM     22
#define SIOC_GPIO_NUM     23
#define Y9_GPIO_NUM       19
#define Y8_GPIO_NUM       36
#define Y7_GPIO_NUM       18
#define Y6_GPIO_NUM       39
#define Y5_GPIO_NUM        5
#define Y4_GPIO_NUM       34
#define Y3_GPIO_NUM       35
#define Y2_GPIO_NUM       32
#define VSYNC_GPIO_NUM    25
#define HREF_GPIO_NUM     26
#define PCLK_GPIO_NUM     21

#define WIDTH 160
#define HEIGHT 120
#define BLOCK_SIZE 5
#define W (WIDTH / BLOCK_SIZE)
#define THRESHOLD 127

double features[H*W] = { 0 };

bool setup_camera(framesize_t);
bool capture_still();
void print_features();

void setup() {
    Serial.println(setup_camera(FRAME_SIZE) ? "OK" : "ERR INIT");

void loop() {
    if (!capture_still()) {
        Serial.println("Failed capture");



bool setup_camera(framesize_t frameSize) {
    camera_config_t config;

    config.ledc_channel = LEDC_CHANNEL_0;
    config.ledc_timer = LEDC_TIMER_0;
    config.pin_d0 = Y2_GPIO_NUM;
    config.pin_d1 = Y3_GPIO_NUM;
    config.pin_d2 = Y4_GPIO_NUM;
    config.pin_d3 = Y5_GPIO_NUM;
    config.pin_d4 = Y6_GPIO_NUM;
    config.pin_d5 = Y7_GPIO_NUM;
    config.pin_d6 = Y8_GPIO_NUM;
    config.pin_d7 = Y9_GPIO_NUM;
    config.pin_xclk = XCLK_GPIO_NUM;
    config.pin_pclk = PCLK_GPIO_NUM;
    config.pin_vsync = VSYNC_GPIO_NUM;
    config.pin_href = HREF_GPIO_NUM;
    config.pin_sscb_sda = SIOD_GPIO_NUM;
    config.pin_sscb_scl = SIOC_GPIO_NUM;
    config.pin_pwdn = PWDN_GPIO_NUM;
    config.pin_reset = RESET_GPIO_NUM;
    config.xclk_freq_hz = 20000000;
    config.pixel_format = PIXFORMAT_GRAYSCALE;
    config.frame_size = frameSize;
    config.jpeg_quality = 12;
    config.fb_count = 1;

Read more »