The page register PG is set via the B mux, so it can be set to a literal or to the accumulator value via ABuf.
Same with the output register
spg 3 ; set page register to 3
pga ; move accumulator value to page register
out 5 ; set output register to 5
out ; move accumulator value to output register
These changes will add some functionality while simplifying the instruction decoding - always nice when that happens!
To make this diagram I tried using Digikey's online schematic tool. The end result isn't too bad but it's not particularly nice to use. If anyone has any good suggestions for tools for making this kind of block diagram, please let me know!
The accumulator (A) can take the value B, A+B, A nand B, or IN, where B is either a 4-bit immediate, or a 4-bit value from RAM. The accumulator can be sent to an output register, or written to RAM.
The 8-bit RAM address is generated from 4 bits in the instruction word, and 4 bits from a page register (PG). So there are sixteen pages, each 16 words (nibbles) in size - an address of $3F is location F on page 3.
In the diagram I've got the input to PG coming from an immediate value, but thinking about it now, why doesn't it come from the accumulator? Then you can compute page addresses and have some indirection. The only reason I can see for not doing that is that you can't have a value in the accumulator and then save it to a specific 8-bit address - you need to clobber the accumulator to update PG. But you could save it temporarily to the current page - perhaps location 0 of each page is kept free for such a thing. More thinking is necessary here - a lot of this stuff has been paged out of my head.
Instructions come from a 4K ROM and are 8 bits wide. The 4-bit operand is either an immediate value or a RAM location. Or - in the case of a jump - the lower 4 bits of the 12-bit jump target. So where do the other 8 bits come from? They come from the next location in ROM, which handily is already available - it's the input to the instruction register rather than the output. All thanks to the "pipelined" nature of instruction fetching. The jump logic will be the topic of the next update.
It would be nice to be able to make use of the carry output from the 74LS283 adder, but it's going to require at least one extra chip to store the carry bit, and maybe more to decode the opcode into a "write carry" signal.
The alternative is to OR all the accumulator's bits together to test for zero. That doesn't need any chips, just four diodes connected like this:
"jnz" (jump if accumulator is not zero) will be our conditional jump.
The question now is, how do you synthesise a carry bit in software if you don't have one in hardware? I couldn't find much information about this - a common definition of the carry bit is "1 if the result of A+B is less than A (or B)", but that's not very helpful - it's not very easy to do an unsigned comparison without a carry flag! In the end I found the answer in the source code for the Gigatron, which I knew doesn't have a carry flag.
Q = A + B
if top (sign) bit of Q is set:
carry bit = top bit of (A & B)
carry bit = top bit of (A | B)
This is where having a NAND operation becomes very useful. ANDing A and B is just a case of NANDing, then inverting:
nan f ; nand with 0b1111 = invert
OR is ~(~A . ~B), i.e. NAND with both inputs inverted. This requires a temporary location:
nan $notA ; ~A nand ~B == A or B
Putting it all together, here's how to add two 8-bit numbers:
; input values
lda f ;a=0xff (big endian, stored at $0/$1)
lda 5 ;b=0x52 (stored at $2/$3)
; add two 8-bit numbers
lda $1 ; add lo nibbles
sta $5 ; store result at $5
nan %1000 ; check hi bit
lda $1 ; msb clr: a or b
sta $f ; $f = not a
set: lda $1 ; msb set: a and b
next: nan %1000 ; hi bit is carry
jmp addhi ; acc already zero if no carry
carry: lda 1
addhi: add $0 ; add hi nibs + carry
sta $4 ; store result at $4
We need a way to load the accumulator with an immediate value ("lda 3"), or from a memory location ("lda $3"), so a mux is required there.
At a minimum we want to be able to add a value to the accumulator. We already have that mux so the value could be immediate or direct. Another mux is needed to select between "add" and "load". The input port can also feed into that mux.
At the cost of one extra chip, a logic function will be very useful. I've put NOR in the diagram but after playing around I think NAND is slightly more useful (easier to do AND for testing bits).
That allows for these instructions:
000m vvvv lda acc = value
001m vvvv add acc = acc + value
010m vvvv nan acc = ~(acc & value)
011x xxxx in acc = in
m: 0=immediate value, 1=value from given RAM address
v: 4-bit value
x: don't care
I really like how this turned out, because the decoding for these instructions requires no additional chips!
Tiny 4-bit CPU with 4-bit input and output ports. Like the #TD4 CPU but a bit more capable.
Connect the output port to a SSD1306-style OLED display. These can be driven via SPI which would be simple enough.
Have it do something non-trivial - I love the TD4 for its simplicity, but can a slightly more capable CPU do something more interesting? How about finding prime numbers, or displaying a fractal?
256 words of RAM is plenty. It will have to be a Harvard architecture with a considerably bigger program ROM - 4Kx8 sounds good. Instructions would be 8 bits wide.
As few ICs as possible. The TD4 has 12 chips, but can only run very small programs. This Brainfuck machine has 14 chips and can run non-trivial programs, but you need to be a wizard to write programs for it. Let's see what can be done with less than 20. Instruction decoding can be done with the help of diodes.
If we're going to write characters to the display we need a font, which is a lot of constant data. The easiest thing is for that constant data to reside in ROM in the form of instructions, which populate RAM:
lda %1011 ; load accumulator with 4 bits
sta $1 ; store at some memory location (upper 4 address bits will come from somewhere else)
lda %0101 ; next 4 bits
The next part of the program then reads the data in RAM and clocks it out serially.
Control flow: we will have something simple like "jump if accumulator is zero". Is this enough? Can you have subroutines when you can only jump to an immediate address? Maybe if you have a jump table at the end of each subroutine, selecting which caller to jump back to.
Addressing: similar question - can all addresses be immediate, or do we need the ability to store addresses in RAM?
There will be many tradeoffs to look at - if removing a couple of chips causes the code size to balloon to make up for it, it may not be worth it.
Hi @Kyle McInnes , cool design concepts!
I agree with @zpekic , the 4-bit CPUs are a good architecture to learn and understand many things. Also, they can do cool things!
Take a look at,