Synchronous All-Purpose Minimal Instruction Reckoning Apparatus

Similar projects worth following
Synchronous All-Purpose Minimal Instruction Reckoning Apparatus

An attempt to improve my first homebuilt CPU that resulted in a new design.

This CPU will be slightly faster, more powerful, extensible and will be able to address a much larger main memory than it's predecessor! I am using the wire-wrap technique to make electrical connections and one board is almost done.

Please read the logs below to learn about the progress :)

SAMIRA MK-II is a 8 bit RISC von-Neumann-type accumulator machine without microcode, which means it only has one register - the accumulator - to work with and the result of every calculation is automatically stored back into the accumulator. Every instruction has a fixed length of two bytes, the first byte specifies the operation to perform, the second byte contains the operand which can either be a data byte or an address. This keeps the decoder fairly simple and makes all instructions take the same time to execute. The 64kB main memory is divided into 256 blocks of 256 bytes each, memory access (including branching) to blocks other than the active block, where code execution is happening, is possible by using additional commands and therefore somewhat limited.

The modules of the CPU are:

CLK - master clock, can be stepped manually or runs on auto
CTRL - controls the state of the machine, running, halted or reset
SEQ - sequencer, controls the instruction cycle
DEC - decoder, a ROM lookup-table, generates control signals in the CPU
IR - instruction register, holds the current instruction
RAM - main memory, holds code and data
PC - program counter, holds current program position
OP - operand register, holds the current operand, can be data or an address
A - accumulator, the one and only main working register
ALU - arithmetic & logic unit, performs calculations and handles branches
B - block register, contains the active block address
C - the intermediate block register
IO - provides an extendable bus interface for additional devices

The ever repeating instruction cycle consists of these steps:

FI - fetch instruction, IR=M(PC)
INC - increment PC to next address, PC=PC+1
FO - fetch operand, OP=M(PC)
INC - increment PC again, PC=PC+1
EXE - command is decoded and executed, registers are modified

The following instructions can be executed:

NOP - no operation, does nothing but wait
LD - loads a byte from memory to A
STO - stores the content of A into memory
ALU - arithmetic or logic operation is applied on A
JMP - program flow is changed unconditionally
BRN - program flow is changed based on conditions
C - modifies the intermediate block register
IN - reads data from a peripheral device to A or memory
OUT - writes data to a peripheral device from A or memory
HLT - halts the main clock and stops execution of code

There are three addressing modes available for most instructions:

Immediate - OP is directly used as data
Relative - OP points to an address within the active block (relative to the base of the block)
C-mode - OP points to an address within a remote block

The ALU can add, subtract, apply bitwise NAND and NOR.
The branch logic can act depending on the flags zero, carry, sign and input request.

More details will follow!

  • Log04

    menguinponkey03/04/2017 at 12:34 0 comments

    I started working on the second circuit board, programmed the ROM ICs for the decoder look-up table and realized that I am able to read out B and C, which allows for much more flexible programs, at the cost of only one additional tristate buffer!

  • Log03

    menguinponkey11/05/2016 at 18:27 0 comments

    There is a german proverb - "Den Wald vor lauter Bäumen nicht sehen" - not to see the forrest because of all the trees. Forget what I wrote in the last log entry about the F-mode. [edit: deleted it] It is called the C-mode now. The C-register is an intermediate register for the B-register, the block register. C is needed for keeping code execution within the current block when writing to the B-register, otherwise it would be a kind of jump, but one that only changes the block and not the program counter, which would be very unpractical to handle. But there is something I did not see, which seems embarrasingly obvious now: there is nothing that stops me from using this C-register to transfer data between blocks - no 0xFF needed :) With only a minimal change and at the cost of one additional multiplexer, that selects either B or C as a source for the high adress byte, I can access (read, write, jump to) any adress within every other block from within each block. The only pitfall regarding this is, that I have to manually change the B-register before and after transferring data between blocks and I already see myself forgetting to restore B to it's old value ;)

    So the
    three addressing modes are now:

    Immediate: Hi=B, Lo=PC
    Relative: Hi=B, Lo=OP
    C-Mode: Hi=C, Lo=OP
    (Hi=upper byte of effective address, Lo=lower byte)

  • Log02 - main memory

    menguinponkey06/10/2016 at 09:12 0 comments

    Alright, so the 'small' change to the structure of the address logic was more sever than I realized. It was still a small change but with large impact, I had to think over the structure all again and I will use this opportunity to write a bit more in-depth about various design decisions regarding main memory management. This will explain one main reason why I even build a new machine.

    The first machine had two addressing modes, immediate and absolute, which determined if the operand byte is to be interpreted as data or as an address.

    But how can we address more than 256 bytes of memory? In my first design this did not matter because I really only used the first 256 bytes of main memory, so one operand byte was sufficient to hold the effective address while accessing a memory cell. In fact that limitation made the hardware very simple. But a lot of simple instructions are needed, even for small programs, so memory becomes scarce easily.

    So we (want to) have 64kB of main memory now (16 address lines, equals two bytes) but our operand only holds one byte. Of course we could add a second operand byte. But it would make the machine slower and eat up a lot of newly gained memory if we had to extend every instruction to a length of 3 bytes. Besides that, if we want to use the operand as data, one byte gets wasted, we don't want that either. So we could modify the sequencer to dynamically read and use either one or two operand bytes, depending on the instruction. But that would make the sequencer more complex (maybe even require microcode), mess up the alignment and readability of the memory, execution of code would still take more time and execution times would be less predictable. Hm, maybe we should extend the whole architecture to 16 bits then? No, this would require more then twice the number of integrated circuits.

    You see now that it is not a trivial task to extend my original design to address a larger memory without significant drawbacks or changes. But I still wanted to stick to it and found an elegant solution, here is how it works:

    The main memory is divided into smaller blocks of 256 bytes each, the operand byte can now address cells within this block but not outside. So in the 64k configuration we have 256 'bubbles' of memory where data can be stored and code can be executed without the drawbacks mentioned above. We use a additional block register B to hold the active block, one byte, which kind of extends the program counter. Now we can operate within such a block and a program can be split into small pieces which fit in a block and run there independentely. But how do we redirect the program flow to or access another block? We introduce register C, the intermediate block register. Before 'leaving' the active block, we use an instruction to write the target destination block to register C and as soon as a jump-instruction is executed afterwars, the content of C is transferred to B and we've changed the upper byte (the block) and the lower byte (the program counter, the actual address within that block). That allows us to execute code fast as long as we stay within a block and use 'local variables' and even if we leave a block, the overhead is minimal. Furthermore, the C register can be used instead of B as the upper byte of the effective adress, which enables us to access data outside of the active block.

    So now we are left with three addressing modes:
    Immediate: Hi=B, Lo=PC (effective address is not changed, data is already available in OP)
    Relative: Hi=B, Lo=OP
    C-mode: Hi=C, Lo=OP
    (Hi=upper byte of effective address, Li=lower byte)

    I hope this will be sufficient to write larger and more complex programs while still maintaining a well-understandable structure in both hard- and software.

    Confusing? I hope so, because it took me a while to figure this out :D

    Maybe you can follow my thoughts and concepts and I would appreciate any feedback on this matter.

    See you around :)

  • Log01

    menguinponkey05/17/2016 at 19:46 0 comments

    Wire-wrapping rocks! I'm progressing really fast, the first board (ALU) is almost done, I updated the picture in the gallery.
    I realized I have to improve a small part of the address-logic, but that won't be a large problem.
    Next log will be more extensive and will follow soon :)

  • Log00 - why? why not!

    menguinponkey05/12/2016 at 22:14 0 comments

    Alright, so what did I learn from the first version of S.A.M.I.R.A. and why do I build a second one?

    1) The ALU was alright, the MK-II will posess a marginally more powerful ALU, which will not only be able to add and subtract, but also use NAND and NOR gates to perform logic operations and bitmasks.

    2) The sequencer will be slightly faster and more efficient.

    3) The decoder will no longer be hardwired but implemented in a ROM lookup table.

    4) Branch logic will be improved. Branching conditions will be: Zero, Carry, Sign, Input. All of them can be used inverted.

    5) I need more memory. Version 1 is limited to 256 bytes (!) of main memory, this is due to the address bus beeing of the same width as the data bus, which simplified the design. If I ever want to have an operating system running on my machine, 256 bytes won't get me anywhere. This problem will be solved in MK-II, by splitting up a much larger memory (up 64k) into smaller blocks of 256 bytes. Access to and execution of content will only be possible within such a block, but switching between blocks will be enabled by a special instruction.

    6) The hardware design needs some serious fixing. Testing the design on breadboard will not be nessecary again because I gathered sufficient experience with digital logic and the design already works in Logisim and in my emulator. Also soldering all connections does not work very well. Neither the 'traditional' way (wires go down through the holes and are soldered to the IC on the bottom side of the board) nor my pseudo-wire-wrap-technique (thin wires directly soldered onto the pins of the ICs on the bottom side of the board)
    So what i will do: ICs on top, pin headers parallel to them, soldered together on the bottom side, but wire-wrapped on the top side. This will be easier, faster and cleaner. Pin headers are not perfect for wire-wrapping but alright. The whole system will be split onto seperate boards, which will be connected via one 'main bus board'. Extensions and peripherials can simply be added by plugging them in the same bus board.

    So eventually I WILL be able to run my own operating system (probably FORTH-like) and operate it through a serial terminal. Let's do this :)

View all 5 project logs

Enjoy this project?



esot.eric wrote 11/06/2016 at 22:09 point

Cool, never seen component-side wire-wrapping before. Looks great

Groovy background-image, too.

  Are you sure? yes | no

Yann Guidon / YGDES wrote 11/06/2016 at 11:59 point

Log03 got duplicated :-)

Oh and if your instructions are fixed 2 bytes, it's a pain to see you spend 4 cycles fetching 2 bytes. Just increase PC by 2 and toggle the LSB ? :-)

Keep faith !

  Are you sure? yes | no

menguinponkey wrote 11/06/2016 at 13:45 point

Thank you for the hints! There is no way to increment the PC by 2 though, this machine does not have microcode, therefore the PC can not be manipulated by the ALU, in hardware it is simply a loadable binary counter! Execution speed never really was a priority, but I will consider decreasing the period of the fetch cycles!

  Are you sure? yes | no

Yann Guidon / YGDES wrote 11/06/2016 at 14:19 point

Your PC incrementer can remain physically the same, but the LSB will come from the FSM :-) (the counter's MSB will be unused)

Going from 5 to 4 steps per instruction, a simple 2-bits counter is enough, its LSB can go to the LSB of the instruction address. Or something like that.

Even simpler : having a 16-bits memory bus, or two 8 bits memory banks (odd&even), so they can be accessed at the same time. Normal data access will simply select the memory bank from the address' LSB.

  Are you sure? yes | no

menguinponkey wrote 03/05/2017 at 20:57 point

Well.. No. :p I thought a lot about how and why I want to design my CPU as I did, I'm quite happy with the result and I am not going to change such fundamental attributes like bus width etc.. Maybe next time ;)

  Are you sure? yes | no

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates