OK, more like enduring joint pain for my work, but that sounds lame.
The glove's machine-learning model starts with an RNN (I'm using a LSTM for now) to pull discrete patterns out of the string of incoming coordinates. I suspect I'll need a bunch of hidden layers after that because letters are made of downstrokes, circles, and other mini-shapes, and I need a bunch of abstraction to distinguish them.
Why does this matter? Recurrent and deep networks are easy to overtrain, and that means my wrist is gonna hurt.
Overtraining is when the machine-learning model aces a test by writing the answers on its hand. The network has internalized the training data, and instead of recognizing an incoming gesture, it just outputs which training sample it looks like. It does great with the training data, but face-plants in the real world. It gets new data, looks at the answers on its hand, doesn't see any close matches, and takes an embarrassingly poor guess.
LSTM's and deep learning are physically large and can encode a lot of data within themselves. In other words, it's got a really big hand and it can't help writing answers on it. There's only one solution, and it's...
Each training run needs to consist of many samples, far more than the model can internalize. If the model tries to make a cheat sheet, it quickly runs out of paper and is forced to actually do the work. With 100 nodes in the densest layer, I estimate that I need about 100 samples for each of the 50 letters, numbers, and punctuation marks to do the job.
That's 5,000 times I need to waggle my hand for science.
The problem is, the network will still inevitably overtrain. I need this glove to work in all kinds of circumstances - when I'm walking, sitting, jogging, have my hands full, can't move much, or need to make big gestures onstage. So, I need...
Even more data.
The machine-learning model, mad lad that it is, tells me to kiss its ass and it memorizes the answers anyways. But I'm a step ahead! I give it a second test with totally different questions on it.
The model blows a wet raspberry and memorizes the answers to that test too. So, when it comes back to class, I give it a third totally different test.
This repeats, the model memorizing tests and me making new tests, until the model starts failing. Its strategy is stretched too thin - the model is just not big enough to encode all those questions and answers, so the only way it can get good grades is to learn patterns, not answers.
This is called k-fold cross-validation - you split your data into subsets so the same model has to ace many different tests at once, and it never gets asked a question from the practice test in the real test.
A rule of thumb is that 10-fold cross-validation is the point of diminishing returns. You need 10 times the data that are required to train the model, to prevent the model from memorizing answers.
That means I need to flap my meat mittens 50,000 times to collect an appropriate training set.
So say I do that. The machine-learning model eventually passes all ten tests, and I rejoice. But that little bastard is wearing an obnoxious grin, almost as if to say, "Stupid monkey! I memorized all ten tests! Eat my shorts!" There's only one way to know for sure, and it is...
I got a fever, and the only prescription is more data.
That's right. As that pissant mass of floating-point operations snickers to itself, I look it dead in the eye, reach into my back pocket, and whip out a double-super-top-secret eleventh test.
This set of data, the sanity check, is special. Unlike the training set, the model isn't allowed to see these data while training. The model has no way to memorize the answers, because it's not allowed to see the grade. I see the grade, and if the model flunks the final exam, I drag it out back and put it out of its misery (reset and start over). If it passes, the...
Read more »