The extracted sounds are not of good quality.
For now it is OK because it is just a proof of concept.
It is important to note many things :
1 - Only a part of the photo surface is used. If we can use all the surface, quality will be better
2- The spectrogram is greyscale. If we use all the colors available, the sound's quality will be better
There is already some colored spectrograms used where one color represents the left and other color represents the right part of a stereo sound.
3- I don't know much about signal processing but I have read the following statement :
"Using the abs() function on complex data is a lossy operation which throws away any phase information.
The phase information encodes the waveform shapes and well as the timing of any transients in the FFT window.
Since that information has been discarded, a magnitude spectrum or spectrogram alone can't be turned back into audio that sounds like the original speech.
But if you keep the full complex results of the FFT, then a complex IFFT might be used in some sort of resynthesis process."
Source : https://stackoverflow.com/questions/22890919/how-to-convert-a-speech-spectrum-to-time-domain
If that is the case, the solution would be to print the amplitude and the phase spectrograms at the same time : The amplitude in green color for example and the phase in red color, the blue color will be used to add more information to the amplitude spectrogram.