The audio is totally FPGA. There is a digital output pin, run through an RC lowpass filter, that feeds an amplifier. Your mission, should you choose to accept it: solder a speaker on to pins J3 on the lower lefthand side of the front. (We'll bring the soldering irons and speakers. Or you could go with a pin-header.)
The rest is done in gateware. There is a 14-bit, 24 MHz sigma-delta DAC with a few mixers that feed into it. For direct PCM/sample output, you simply write samples to memory and they get played.
But there's also a full-blown twelve-voice polyphonic synthesizer on board, vaguely inspired by the C64's SID chip, but with a lot more going on. Each voice has configurable attack and release rates, and is capable of running by itself for a predetermined duration once triggered. The result? You tell a voice to play for so long at this pitch and it happens automagically. Your software can get back to work.
Naturally, all of this is implemented in "hardware" inside the FPGA itself, so if you want to add some exotic features, just get hacking.