Training

It is finally time to train the model that I described in the previous log - the script is available in my repository and is called person_detection.py
First I need to load the data and make sure the images are 96x96 pixels grayscale. We will need to use int8 for the input on the microcontroller (uint8 is not accepted), hence I also convert the images to that range. I created a module dataset.py for the purpose of loading data as I will be reading images also in another script. I always like to check my dataset so, I choose a picture for display:

plt.imshow(images[1], cmap='gray', vmin= -128, vmax=127)

Then I split the data to use 60% as training set, 20% as validation set and 20% as test set . Note that train_test_split method can shuffle the data.

    #split images to train, validation and test
    X_train, x_test, Y_train, y_test = train_test_split(np.array(images), np.array(labels), test_size= 0.2)
    x_train, x_val, y_train, y_val = train_test_split(X_train, Y_train, test_size = 0.25)

Validation set is used to check the accuracy of our model during the training. Next step is to create the model (as per last log) and finally - time to train:

  # Fit model on training data
    history = model.fit(x_train, y_train, epochs=EPOCHS, validation_data=(x_val, y_val))

Here we need to choose batch size end epochs. Batch size specifies how many pieces of training data to feed into the network before measuring its accuracy and updating its weights and biases. Big batch size leads to less accurate models. However, it seems that the models trained with large batch sizes tend to become dataset specialized thus they are more likely to overfit. Too small batch size on the other hand results in a very long computation time. Small batch size means that we will need to calculate the parameters more frequently - hence increased training time.
Regarding epochs - this parameter specifies the number of times the network will be retrained. The intuition would be - the more the better - however, this would affect not only the computation time, but also it turns out that some networks may start to overfit.

And voila! We can observe the training progress.

When the training is done, I want to observe the basic metrics:

    # Extract accuracy and loss values (in list form) from the history
    acc = history.history['accuracy']
    val_acc = history.history['val_accuracy']
    loss = history.history['loss']
    val_loss = history.history['val_loss']

    # Create a list of epoch numbers
    epochs = range(1, len(acc) + 1)

    # Plot training and validation loss values over time
    plt.figure()
    plt.plot(epochs, loss, color='blue', marker='.', label='Training loss')
    plt.plot(epochs, val_loss, color='orange', marker='.', label='Validation loss')
    plt.title('Training and validation loss')
    plt.legend()

    # Plot training and validation accuracies over time
    plt.figure()
    plt.plot(epochs, acc, color='blue', marker='.', label='Training acc')
    plt.plot(epochs, val_acc, color='orange', marker='.', label='Validation acc')
    plt.title('Training and validation accuracy')
    plt.legend()
    plt.show()

Obviously we would like that validation accuracy follows closely accuracy on the training set, similarly the validation loss shall follow loss on the training set. Of course the validation set will perform worse, but we don't want them to fall too far apart.

After that, let's try out our training set.

    # Evaluate neural network performance
    score = model.evaluate(x_test,  y_test, verbose=2)

Just like that I have my first model trained - however, now it is time to play with basic hyperparameters and try to achieve better results.

Happy playing!

Building a model

Overfitting

Discussions

Become a Hackaday.io Member