Well, that was a fun experiment, and I managed to get everything working; but, I had to remove COM from the BIOS. The problem was that, with MGIA video buffer spanning $10000-$13E7F, I only have 384 bytes for the BIOS to use between $13E80-$13FFF. I don't nest subroutines very deeply, so you'd expect this to be plenty of space, even with GCC, which requires a stack pointer to be aligned on a 16-byte boundary at all times.
Well, it turns out that, for some reason, COM method calls resulted in a bit too much nesting, as more pressure was placed on the stack to hold such things as method table pointers and such. The result is that the BIOS would corrupt the video display. Don't get me wrong: nothing ever crashed (though it could have under some pathological cases); but it was a very obvious bug that needed repair.
Most video bugs are, I guess, "very obvious bugs" by definition. ;)
Instead of COM, I just use a flat entry-point vector (table of function pointers). Programs loaded from secondary storage have zero idea where this table is actually located, so it must scan memory for a special signature to locate it. It's hokey and hackey, but it works, and allows me the freedom to relocate where I load programs in memory at any time in the future. The only hard requirement I have in place currently is that the signature must reside in the first 128KiB of the CPU's address space.
This more direct approach to linking functionality from ROM into RAM has reduced stack pressure enough that the video display is no longer corrupted during normal operation of the TIM/V monitor.
384 bytes is a relatively small amount of stack space, especially for software compiled with GCC. Point blank, C is not a good language to write small, tight, highly efficient software in. The problem is not code size (at least not directly); rather it's how it uses its stack. Overloading a single stack with both continuation and state information results in a larger than expected pressure on the stack, for two reasons:
- The compiler must generate stack frame constructors and destructors for each procedure you compile. If all you're doing is threading values from one function to another three call levels down, the two intermediate procedures must include code to shuttle data to and from: wasted instructions and stack space that otherwise serves no useful purpose. In COM, this happens with surprising frequency. With a dual-stack environment, this never happens. The result is a reduction of code you need to run, as well as a reduced stack frame size, without sacrificing proper code structure.
- Second, Forth-like languages often make use of two (or more) small stacks, not one big stack. Separating data from continuation information makes it vastly easier to recycle the relevant stack space. It's not uncommon for very small Forth systems to have between 8 and 16 slots on their data and/or return stacks. Phil Koopman, in his book Stack Computers, The New Wave, documents how incredibly rare it can be for software to exceed 24, 16 slots on their data, return stacks (respectively). Packing a Forth runtime environment's stacks into 384 bytes would be an exercise in triviality.
I'd like to rewrite the software in a static subset of Forth, using a compiler specially design for the Kestrel-2DX's unique memory requirements. This means it should be quite miserly with its use of stack space in practice. Obviously, I have something which works now; I've not decided if I will go forward with the Forth compiler idea yet.
In either case, however, software bootstrapped from secondary storage is expected to relocate the stack, so the 384 bytes configured at system start-up time is not set in stone.