The hardware is complete and now collects dust as the firmware development grinds on. The virtual CPU is taking shape and lots of interesting problems are getting solved. One of the major hurdles is building out a functional ALU. The current hardware really doesn't have one, but can utilize lookup tables in the ROM to perform arithmetic operations.

One example is addition. The ROM can be used to add two 8-bits value to produce an 8-bit sum. The easiest way to do this is with a 256x256 table to store the 65,536 possible values (64k bytes). This is a lot of memory though, so the problem is broken down in to two steps using two 256x16 tables (4k bytes each). The first table takes a full 8-bit value and adds the lower nibble. The second table takes the result of the first step and adds the upper nibble.

example: 19 + 28 19 8 + --- 27 2 + --- 47

Both methods have a major flaw though. The ROM produces an 8-bit result, but adding two 8-bit values results in a 9-bit value; there is no carry from the ROM lookup table. The carry is an important component of any ALU, not only a carry out, but also a carry in to allow the chaining of arithmetic operations. Other flags are also useful, such as an **overflow** to indicate a carry from a signed addition, a **zero** flag to indicate if the result was zero, etc. The only flag the ROM table can produce is the **negative** flag, since it is the most-significant bit of the result. This is therefore the only flag available for performing conditional logic.

The negative flag can be used to determine the carry by looking at the before and after state of the addition. This takes a lot of cycles and complicates things. The solution is to add another pair of tables to the ROM to create the **Arithmetic Flag** (AF) operation. This is a little more complex than it sounds due to the way the nibble cycles work. The first cycle returns the flags of adding the two lower nibbles: These include the carry and borrow from adding/subtracting the first 4 bits and a zero and parity flag for the first 4 bits of the result. The second table completes the flag operation using the initial 4 flags and processing the upper nibbles.

To find the flags, the **AF** operation proceeds an **ADD** operation to generate the following:

```
Flag bits - CNZPHOBL
C - Carry (from bit 7)
N - Negative (sign of result)
Z - Zero (high if result is 0)
P - Parity (parity of result)
H - Half carry (carry from bit 3)
O - Overflow (carry from bit 6)
B - Borrow (from bit 7 if subtraction)
L - Low borrow (from bit 3 if subtraction)
```

The original **SUB** operation was removed to make room for **AF** leaving **ADD** as the only byte-wide arithmetic operation in the ROM. Subtractions can still be done by using the **2COM** function (in the default set of unary operations) to negate the value being added. An additional pair of flags indicate if there was a borrow from the equivalent subtraction.

The final problem is the carry in. There's no easy solution to this, so the entire **ADD**/**AF** sequence is needed to first conditionally add/subtract 1 to the accumulator if the carry/borrow bit is set. Then the normal arithmetic operation is performed and the flags from both operations combined. This makes add with carry, and subtract with borrow, some of the most expensive operations on this type of restricted hardware.

## Discussions

## Become a Hackaday.io Member

Create an account to leave a comment. Already have an account? Log In.

The reason for this method was to keep the chip count down. It's not that efficient on memory, but I had the memory to spare. There are some projects that use the full 8x8 LUT, but I wanted to keep the ALU below 128k. The two cycle approach fits the full 8-bit binary operations in just 8k (two tables of 4x8), so I can get 8 operations in 64k.

I think what you propose would require 2-3 extra chips. At a minimum you would need two quad 2:1 multiplexers to switch the upper nibbles to the lower address lines so you can reuse the same 4x4 LUT. There's additional complexity in doing that so at least another chip worth of logic as well.

I'm trying to think if there's a "clever" way of doing it, but I can't think of it. It will be interesting to see what you come up with.

Are you sure? yes | no

As you might have seen there is no need of additional multiplexers if you wire properly one registry (that I called by purpose W).

Didn't see your answer and I edited the one below a bit (since I think you should do 3 lookup instead of the two with nibbles), and also you've the final carry ready to be tested for example for conditional jump.

Are you sure? yes | no

I wasn't quite able to follow the block diagram, so I might need a schematic to fully understand. The 2-cycle approach was about as slow as I would want to go. There is the AF operation (discussed above) that will return the additional 8 status flags. You can choose to add these extra cycles if needed, else you can stick with the shorter basic operation.

Are you sure? yes | no

I personally think that adding the nibbles of the first addend to the full byte of the second or partial sum was not the best solution. I see two problems here that could be solved adding first the lowest nibble togheter and then the high nibbles togheter (as I proposed for microtron, cfr https://forum.gigatron.io/viewtopic.php?f=4&t=255):

1. you would have used 256 byte LUT

2. the result of this partial sum would have given you immediately the carry on the 5th bit

The number of lookups are three (because of additional carry lookup, and table to store) instead of the two of nibbles addition, and you lose also the possibility to have carry ready to propagate for word or longword additions (or to test for conditions).

Did I miss some problem or advantages?

Are you sure? yes | no