Last meeting, machine learning was brought up as a potential option for this project. One key takeaway for designing for the population of individuals with CP is everyone is unique and has different ranges of motion. The question was raised whether or not it was feasible to implement a training model to learn the unique gestures and motions of this population and use their gestures as inputs for our universal remote. The questions that were brought up were: 1) How do we implement it, and 2) How much data do we need to collect?
GESTURE RECOGNITION STUDIES
Gesture recognition and control is an active field of research with its useful applications in human-computer interaction. There are many different variables such as input signals, gestures, and training classifiers that are used. I took a look at 5 studies that were IMU and/or EMG based and summarized them in the table below.
|Paper||# of Features||Input||Placement||Gestures||Population|
|Jiang (2017)||8||4 EMG + IMU||Forearm||8 hand (static)||10|
|Kang (2018)||27||3 IMUs||Forearms + Head||9 full body (static)||10|
|Mummadi (2018)||10||5 IMU||Fingertips||22 hand (static)||57|
|Kundu (2017)||15||IMU + 2 EMG||Forearm + Hand||6 static + 1 dynamic||5|
|Wu (2016)||5||3 IMU||Forearm, upper arm, body||3 static + 9 dynamic||10|
|Paper||Sets per Gesture||Total Sets||Classifier||Validation||Accuracy|
|Jiang (2017)||30||300||LDA||K-fold cross validation||92.80%|
|Kang (2018)||20||200||Decision trees (+bootstrap)||Experimental||88.01%|
|Mummadi (2018)||5||285||Decision trees (random forest)||K-fold cross validation||92.95%|
|Kundu (2017)||20||100||SVM||K-fold cross validation/Experimental||94%/|
# of Features: The input dimensions of the training model. This could be raw data (accelerometer signals) or processed data (pitch/roll/yaw).
Input: For the inputs, I wanted to only focus on IMU and electrode inputs. Some other studies involve using cameras, which would be a less practical low-cost, open-source solution.
Placement: Most studies focus on forearm movements. The study by Kang et al. was recognizing gestures during stationary cycling which required an IMU on the forehead.
Gestures: Most of these gestures were static hand gestures.
Population: All of these studies were done on able-bodied individuals. For our potential application, we would expect more variability in our data.
Sets per gesture: For each participant, they were asked to repeat all the gestures numerous times for data logging. As an example, Jiang et al. would ask their participants to 30 sets of 8 gestures, for a total number of 240 sets. These sets were taken over 3 training sessions to vary conditions.
Total sets per gesture: The data for all participants were aggregated for each gesture and trained independently. Total sets ranged from 100-300.
Classifier: There is a lot of variability in the models. Most of these classifiers are supervised machine learning models except for the last study, which uses a custom algorithm.
Validation: Most training models were validated using K-fold cross validation. Using the study by Mummadi et al. as an example, they had 5 participants. For their validation, they would aggregate data from 4 participants for training the model and use the 5th participant's data for model validation. This process would be repeated 5 times.
The other method is experimental, which is comparing the trained model to a newly collected dataset. One relevant study was Kundu et al., who validated through the application of controlling a wheelchair. They also compared the operation of a wheelcahir with gesture control vs. a conventional joystick.
Accuracy: Most of these studies showed an accuracy of about 90%. For each study, accuracies for different gestures would vary between 70% to 100%. Most of these studies were proof-of-concept and if they wanted to improve the accuracy, they would collect more data and/or choose distinct gestures.
HOW DO WE IMPLEMENT IT?
Most of these studies were IoT wearable based systems, which is a pretty relevant platform for our project. Implementation can be broken down into a few steps:
- Design a wearable device to collect motion data
- Collect motion data through training sessions (This data is usually labelled for supervised learning)
- Using the collected motion data, train and validate the model
- Process trained model
- Upload firmware with custom trained model data to wearable device for use
Within the coming weeks, I will try to implement this with a small dataset that I collect to get a better idea how feasible it is to continue with this idea.
WHAT WOULD WE NEED TO CHANGE FOR USERS WITH CEREBRAL PALSY?
For the studies summarized, they were all with healthy, able-bodied participants. With individuals with impaired motor control, this may mean the number of gestures they can perform may need to be reduced. This also means the data will not be as consistent, To account for this, we may need to collect more data and also be wary of collecting "bad" data. Lastly, most of the studies used an aggregated dataset of all participants for training and validation. This may not be possible to do with the CP population based on the uniqueness of each person's movements. It is also probably not a good idea to collect able-bodied data and apply it to a CP population. If we cannot aggregate data, this means we would need to collect more data for each user and have an independent model for each person.
HOW MUCH DATA DO WE NEED TO COLLECT?
From the 5 studies, the minimum dataset for a model was 100 sets x 7 gestures. If we are assuming we cannot aggregate user data, this means each participant would need to do 700 actions to match the dataset for the study. This is a lot of data to collect especially considering asking someone with CP to do this. It would be helpful if we could integrate these training sessions with physical therapy sessions. Another helpful approach could be turn these training sessions into video gaming sessions, where these gestures are fun, task-based games.
As an initial feasibility test, I would reduce the number of gestures to 3 and perform 50 sets of each and see what the accuracy would be. I would then try to add a 4th gesture of 50 sets and retrain, to see the effect on the model and go from there.