I've figured out how to make use of the tightly coupled memory (TCM) on the SAMS70. The TCM comes in two chunks - ITCM for code and DTCM for data. Configuring TCM results in a reduction in the amount of SRAM available. I've compromised and configured 64K each of ITCM and DTCM, which results in 128K of SRAM.
I'm using the ITCM for some of the hot-spot code centered around the cryptography. That makes a little less impact, however, due to the fact that I previously was able to enable the ICache (enabling the DCache isn't quite so easy, as you have to sprinkle cache invalidation calls throughout your code to get away with it). I've moved the stack, the main disk block buffer and a few other crypto-related data structures into DTCM. I tried to move everything there, but that hasn't worked. Still, even with just that much (and with -O3), the throughput for sequential reads is North of 1.3 MB/sec.
I've also managed to extract the chip's unique ID and turned that into a USB serial string. It shouldn't really matter, but you can use that to tell two Orthrus' apart (or make sure that your Orthrus hasn't had its controller swapped out without being noticed).