A cheap microphone capsule salvaged from the junk bin, together the old and ugly LM358 as preamp, feed the audio signal to the STM32F103 (bluepill, bottom dev kit). For this MCU, at 72 MHz, with a DMA capable 12-bit ADC, sampling the audio is a walk in the park. I plan to use the ESP (top) to do the IoT / wifi part of this project.
Ideally you'd use just one MCU, but the ESP has a very slow ADC that is not intended for audio, and though I have a couple SPI codecs (the unconnected Microchip part on the upper breadboard), it was much simpler to do the heavy ADC+processing job on the ARM Cortex chip.
Parts wise... the bluepill is even cheaper than the SPI ADC part!