I shrank the time window in which a gesture can be preformed from 3 to about 1.4-1.5 seconds. It is working well and feels much more fluid than the original. There is still a little room for optimization on gesture timing, but I am happy with its current state.
This time when comparing models, the 2nd and 3rd cnn were better trained than the first. However, the larger, 2nd model preformed better than smaller 3rd one. I proceeded with using cnn_model2_half.h5
Some improvements need to be made when using pandas however. Since the dataframes memory is not released each gesture sampling will consume more and more memory. Once solution is to use the multi processing library to run the gesture sampling as in a separate process. When a processes completes it returns its memory to the operating system. This should allow for true continuous gesture reads without filling up the memory on the raspberry pi.
Still working towards deployment on the esp32, but just making sure there is plenty of room for other bits of code and processing.
The files have been update are reflected in the GitHub.