Okay, initial thoughts.
As a first cut I want a program that will play a random piano note, so that I can try to find that note on the piano.
Basically, I'm tuning my ear to be able to recognize piano notes by pitch. I frequently can "hear" notes in my head, but have no idea which notes on the piano these are.
Note distribution and project data
A naive implementation would draw randomly from the 88-key span of a piano and play a note at random, but this is sub-optimal for learning. Most songs don't encompass the entire piano range, so it doesn't make sense to learn the very high or very low notes at the same rate as the more often played notes.
A second naive implementation would draw from a normal distribution centered on middle "C" of the piano, since that's where most of the songs will be.
A more nuanced implementation would draw from the distribution of actual notes found in actual songs.
Step 1: Get songs
So my first step is to grab a large dataset of songs in MIDI format, and extract the note information. It shouldn't matter the songs, so long as they are piano pieces and there are a enough of them to get a good statistical average.
So I wrote a program (available in the GitHub project) "GetMutopiaSongs.pl" that downloads all the .midi files for all songs listed as piano pieces.
Piano/Songs: perl ../bin/GetMutopiaSongs.pl : : Done! 782 files downloaded Piano/Songs:
Voila! A nice dataset of over 700 songs.
Step 2: Get notes from one song
Second step is to grab the note information from a single song, so I wrote a program "MidiDump" to do just that:
Piano/Songs: ../bin/MidiDump.pl leslanciers.mid : : ['note_off', 192, 1, 48, 64], ['note_on', 0, 1, 55, 127], ['note_off', 192, 1, 55, 64], ['note_on', 0, 1, 52, 127], ['note_off', 192, 1, 52, 64], ['note_on', 0, 1, 55, 127], ['note_off', 192, 1, 55, 64], ['note_on', 0, 1, 48, 127], ['note_off', 768, 1, 48, 64], Piano/Songs:
A long series of MIDI "note_on" and "note_off" commands, plus some extra information.
Step 3: Histogram all song notes
I deleted the file "Erlkoenig_alt.mid", as it appears to call out notes outside of the range of a piano.
Also, I'm making the tacit assumption that all tracks of all MIDI files refer to a piano instrument. That may not be correct; if so, I'll have to revisit this process later.
The program to histogram all notes from all MIDI files in a directory is called "HistNotes.pl":
Piano/Songs: perl ../bin/HistNotes.pl >../Data/NoteHist.txt Piano/Songs:
And this generates a quite respectable piano note distribution:
The MIDI range of notes for a piano is 21 through 108, so this looks good.
Also, certain notes are much more popular than others, which is to be expected.
A program to generate random notes can draw from this distribution directly, so that the notes most likely to be used in a piano song will be the ones most likely to be used.
Epilogue: Further uses for the data
Having a database of piano songs and the ability to extract notes and other MIDI information might be useful in other ways.
For example, once I have learned all the single notes it might be useful to generate a distribution of all the *pairs* of notes (simple chords of two notes), and later on a distribution of chord triplets.
More on that later.