Update

In the past months i've been working on the hardware design and the microcode.

The number of required IC's got too high in my opinion, so some hardware was removed, every removal costing some effort to implement the removed instruction in microcode. I removed the following items:

The 74ac151 that calculated the V flag. Now, the two bit-7 adder inputs and the bit-7 adder output, needed to calculate the V flag, are connected to an input port (that has some unused inputs), and the value is saved after every addition/subtraction (or 6502 BIT instruction). When the V flag is tested, it is first calculated (by using the 3 bits to compose an opcode, and executing that opcode).
The zero-calculation. This was composed of three 3-input NOR gates (74F27) and a 3-input AND gate (74ac11). The other gates in the 74ac11 could not be used somewhere else, so this saves 2 IC's. After a ALU operation, the result byte is now saved to a (memory-based) register called reg_z. When the Z flag is tested (for a BEQ or BNE instruction), an 0xFF value is added to this register (same as decrementing), and when a carry occurs, the value was non-zero. There also was a physical flag Z and a upd_z microcode bit that indicated that this flag had to be updated. Both are not needed any more, and the upd_z frees a microcode bit that will be used to double the amount of available (memory-resident) registers. But it costs one extra cycle at every instruction that updates the Z flag, and one extra cycle when the Z flag is tested.
The shift-right multiplexers (2 x 74hc157) were removed. Shift-right is now done with a table in RAM (outside normal 64K section). After reset, some microcode constructs this table.
There was a special 8-bit buffer to put a byte of microcode on the databus. It was intended for special microinstructions that could place this byte in RAM (at an auto-incrementing pc++ position). This would be convenient for moving boot-code to RAM directly after reset. But this can also be done without this buffer, costing one extra microinstruction per transferred byte, and some microcode to organize this.
It was intended that the video section would have it's own RAM, such that video can be generated while the cpu is doing it's own thing. This will now be an option (additional pcb, called 'performance option' ), and the on-board video is bit-banged by the cpu ( 'economy version' ). It saves around 8 or 9 IC's.

Of course I've been thinking about the video generation. The on-board economy version will have a 6-bit color value in both the A and T register. A multiplexer will select which color is connected to the output. There can be a few video modes, and the video mode can be different for each line. The basic video modes are:

160 pixels/line. In every 160nS cycle, the A register will be filled with a new 6-bit color, providing 160 pixels per line, of 64 colors each. (it is just a special case of the 320 pixel/line mode).
320 pixels/line. In every 160nS cycle, the A register will get a new 6-bit color. The remaining two bits determine the color of the two 80nS pixels that will be displayed in this cycle. Each pixel is either the foreground color from the A register, or the background color from the T register.
80 column text mode. In an 'odd' 160nS cycle, 7 bits will be read, 4 bits will be 40nS pixels in the odd cycle, and 3 bits are 40nS pixels in the next ('even') 160nS cycle. The 4th pixel in this next cycle will be background color (blank pixel between characters). Again, each pixel is either the foreground color from the A register, or the background color from the T register. Since only 7 bits of the byte are used, there is one spare bit that indicates that a new foreground and background color will be loaded in the next two 160nS cycles (while displaying a space character).

During the verical blanking time, the cpu will be executing instructions. At the end of every line, it will get an interrupt, that counts the lines, organizes the vertical sync pulse, and checks if a line of video pixels must be written. (There is a hardware counter that generates the line interrupt).

This week I was working on the interrupt response time. I measured this time with the emulator while running the Apple Basic or TRS80 basic, and the maximum delay between two tests of the interrupt signal was quite long (more than 40 cycles). While most instructions checked the interrupt (by moving the IRQ signal to the F flag, and then jumping to the interrupt routine at the end of an instruction when F is active), there were a few instructions (like conditional jumps) that needed F for another purpose, or that had no opportunity to move the IRQ to the F flag (because moving to F flag does not go together with an ALU operation like ADD, SUB, AND, INC).

The last problem was solved by defining that, at the end of every instruction, the F flag should represent the IRQ state. So the microcode was changed to accomplish this.

The next thing to improve the response time, is to test the interrupt state not only at the end of an instruction, but also in the middle or at the beginning of an instruction (especially for instructions that have a lot of cycles). In this case, a special interrupt entry will be jumped to, that sets the PC one or two bytes back, so the same instruction will be repeated when the interrupt has ended. Of course, the interrupted instruction must not already have done actions that will cause misery when they are repeated.

To be continued...