Researching Machine Learning

The project is really centred around the ability to use machine learning to analyze a live cough sample and predict if the sample is covid-19 positive or negative. What is the best approach? Has this been done before? Can I access data?

Note: There are many more notes and links in the source code repo in the sagemaker folder.

Has this been done
1. Found research from MIT into covid-19 cough predictions
2. Found research from South Afria Uni into covid-19 cough predictions
Best approach
1. Reviewing the research there is a common theme, the models use CNN to train the model on images rather than audio data points. There is research to suggest that this provides better results. So we need to make sure we can produce an "image" of an audio sample and use that. It may be better to combine the two, but for now we'll just use images.
2. Many started with a known architecture, the resnet design provided good results, some appear to be using resnet50. This will just be trial and error for our model, but this gives us something to start with.
3. More data the better, the lack of positive and negative samples is a problem. Will need to work through this.
Can I access the data
1. The university research data is not available to me as I'm not doing this through the university. The models discussed and the code is also unavailable. I have found an excellent open-source project which has collected thousands of labeled samples and provides some "starter" code that we can base this project off.
  1. virufy-covid
  2. coughvid

Data preparation for the model

Discussions

Become a Hackaday.io Member