The basic method for Decimal Mode is to perform an ADD or SUB operation, and then convert the result to BCD. The process is to work on each nibble in turn, as follows:
Adder LO --> Detect LO --> Generate LO --> Adjust LO --> BCD result LO
Adder HI --> Detect HI --> Generate LO --> Adjust HI --> BCD Result HI
Detect_LO tests to see if the lower nibble needs to be adjusted. This would be the case if the the binary result is greater than 9, or if the low-nibble carry (C4) is high. To adjust an ADD result, Generate_LO will generate a 6 (or 0 if no adjustment is needed) which is then applied to the binary result by Adjust_LO. Generate_LO will also generate a BCD low-nibble carry (BCDLC) in that case. The process is the same for the upper nibble, except that BCDLC must be added to the upper nibble result. The same logic holds for subtraction, except that Generate_LO and HI will produce a $A rather than a 6 to perform the adjustment.
Now the binary adder alone consumes the entire cycle at 100MHz, so Decimal Mode at high speed will need to take two cycles to complete (like it does on the 65C02). A happy consequence of this is that we can use the ALU adder for both the original binary operation and the subsequent adjust operation. To do so we feed the result of the initial binary addition back into the ALUA input, and feed an appropriate Adjust Value into the ALUB input for each nibble.
Because the binary result for the lower nibble emerges from the adder early in the initial cycle, we are able to generate the lower nibble Adjust Value in the same cycle, like this:
Cycle 1: Adder LO --> Detect LO --> Genereate LO --> ALUB
Cycle 2: ALUB LO --> Adder LO (B input) --> BCD Result
The high nibble, on the other hand, is not ready until the very end of the initial cycle. We must therefore generate the Adjust Value for the high nibble in the second cycle, like this:
Cycle 1: Adder --> ALUA
Cycle 2: ALUA HI --> Detect HI --> Generate HI --> Adder HI (B input) --> BCD Result
This will work, as long as the high nibble Adjust Value can be generated quickly. Adding an alternate path to the B input of the adder will add capacitance, but only minimally so and only to the high order bits of the carry-chain where we can tolerate some delay.
Thanks to Dr Jefyll and ttlworks, the BCD adjust circuit in the C74-6502 is very fast already, and we can adapt it for our purposes here. This circuit produces results that are compatible with the NMOS 6502 for both decimal and non-decimal inputs. It uses FET Switches for time critical logic. With a little rejigging, we can adapt it to work in this new design, as is shown in this rough schematic:
The high-nibble Adjust Value is generated by four FET Muxes in series (BCD.DETECT.HI, BCD.DETHI.AUX, BCD.SEL.HI and ALUB.SEL). This value is then fed into the high-nibble of the FET Adder. Earlier tests showed that CBTLV switches took about 1ns longer than AUC parts in the carry chain. The Adjust Value path is therefore likely to delay the adder result by that margin as well. Thankfully, because the results of Decimal Mode operations are never used as addresses, the Adjust Value path does not have to meet the 1.5ns setup time of the synch RAM. We therefore should have just enough extra time for this path to work.
In order to remove from the adder the delay associated with the BCD carry, it’s easiest to break the carry chain at C4 and perform to separate adds for the low and high nibbles. The BCD carry can then be added in at the end as bit 0 of the high-nibble Adjust Value. In order to make this work, Detect_HI must adjust the threshold to test for > 8 for addition and < $F for subtraction. The ADJ1 and ADJ7 values that are input to BCD.DETECT.HI achieve that in the schematic above.
We can separate the FET carry chain at C4 without adding capacitance by using the INH pin on the 74AUC2G53 C4 IC. An alternate C4' tied to GND can push a zero into the carry chain as needed. Both C4 and C4' can be switched before the ripple carry arrives if the control signal is generated early in the cycle. A 75AUC1G74 that is pre-loaded in the cycle ahead of the ALU operation can generate active-low and active-high control signals to make the switch. (The BRK.CARRY signal going to the FET Adder in the schematic illustrates that function).
One final note regarding flag evaluation: we can use the final BCD adjusted result to obtain the correect results for the N, Z and V flags. This behaviour is compatible with 65C02 and the 65816 CPUs. The NMOS 6502, on the other hand, calculates the flags based on the original binary sum, but with the BCD low-nibble carry (BCDLC) added in. Since this value is no longer calculated by the ALU adder, we can include a simple 4-bit incrementer to add BCDLC to the upper nibble of the binary sum. This would be done during the second cycle of the BCD operation.
As will likely be the case with everything in this design, we meet the required timing for this circuit only by the skin of our teeth. It will be impossible to know whether we will reach the target clock-rate until the whole CPU is built. For now, I am doing my best to account for even the smallest delays, and have taken to including clock-skew and trace propagation delay in my estimates of the critical path. That will give me some idea of which components will need to be near each other in the final layout. At these speeds, just getting signals from one side of the board to the other is going to be a challenge!
P.S. Wait, hold the phone!
The carry chain is in fact split at C4 for the initial binary add as well. That means that the low and nigh nibbles emerge from the adder at exactly the same time, early in the cycle! So we can in fact make a start on the high-nibble Adjust Value in the initial cycle as well, and ease the time crunch on the second cycle. We only really have to capture the BCDHC and /BCDHC and we can generate the Adjust Value in the second cycle from that. Nice.
Phew! Feels good to come across a couple of unclaimed nanoseconds — it’s like finding an open parking spot downtown. :)