Close
0%
0%

SPAM-1 - 8 Bit CPU

8 Bit CPU with simulator and toolchain and a hardware build to follow

Similar projects worth following
SPAM-1 : Simple Programmable And Massive - v1/v2
"Massive" in the sense that it's going to be built on breadboard.

v1 was simulated in Logisim Evolution with an assembler written in Google Sheets - I've not seen this done before - is it a first? My intention was to make the tooling as accessible as possible and more visual.

However, once that initial effort and sim was completed I decided that this was too trivial to actually build as hardware. So I decided to bite off a lot more.

The latest design v2 is a heck of a lot more complex but also more capable. But it won't be relying on google sheets as the development effort of programming in that environment is too high, and also because I want to be able to use Icarus verilog and other such tools.

V1 objective 

⭐️ I wanted to do things a little differently to some of the other efforts on the internet! 

I wanted the assembler and any other code I write to to be more readily accessible and instantly usable to others (like you) without installing python or perl or whatever first, so I've written the assembler in google sheets!I want to be able to run at least the typical demo programs like Fibonacci

  • I would like to extend it to play some kind of basic game (tbd)
  • It will have an assembly language and assembler for it
  • I might port C to it
  • I want to simulate it first
  • I want to build it physically, or a more likely a derivative

The Assembler is in Google Sheets ....

However, on completion of this phase of the project I made these observations (sorry about the font size)...

So I changed tack ...

V2 objective

I got v1 working in the sim but then decided I wasn't going to build it in hardware as these very simple CPU's aren't that capable or complex and I wanted more of a challenge; something that would force me to learn more.

So this has changed out of all recognition.

Along the way I got distrated by a hardware build of a testing device for all the chip's Ive bought for this CPU project. The testing device project can be found in my project list. That was interesting and a throw back to my Uni days in the 80's.

Now, I'm firmly back on the CPU task and nearing the end of the design and sim phase.

Check out the project logs !

Graphics Interchange Format - 2.18 MB - 08/03/2019 at 19:59

Preview
Download

  • 1 × Logism Evolution
  • 1 × Google Sheets

  • Destroyed my first chip

    John Lonergan2 days ago 0 comments

    Forget static discharge.

    Instead, try running a high capacity lithium ion power bank through a register chip that's wired up incorrectly. 

    The breadboard practically melted and I burned a chip-shaped mark in my forefinger before I figured out what was up.

    Stupido.

    Perhaps a current limited supply might be safer - I don't have one. 

    Would measuring the resistance from Vcc to Gnd before applying power have spotted the issue before things melted?  It might, but it would be a PITA having to keep making that measurment after each wiring change. 

  • Hardware finally underway

    John Lonergan2 days ago 0 comments

    The two memory address registers and the program counter are built.

    The layout of the various breadboards and components was planned with pen and paper and counting pins....

  • Playing with the DM85S68 16x4 synchronous register file

    John Lonergan07/25/2020 at 23:19 0 comments

    The DM85S68 is an oldie. I came across it when researching register files for SPAM1 and when I bought a few 74HCT670's I also bought a few DM85S68 too (at least I hope they're not counterfeit).

    Unlike the 74HCT670, which is entirely asynchronous, the DM85S68 has a synchronous load and an async read. This means it's probably better suited to my needs than the 74HCT670. Where I've designed around the 670 I've had to add a 74HCT574 to latch the data input because without this then the register file and the ALU would be a big combinatorial circuit with a feedback loop; not good.

    Also, the 74670 is a 4x4 register, vs the 16x4 DM85S68, so whilst I'm only building 4 registers then that's immaterial, but if I went for the full 8 registers my addressing permits at the moment then perhaps the DM85S68 is a better choice. 

    The big downside perhaps of the DM85S68 is that it's supply current is rated as 70mA typical compared to the 74HCT670 which will use a fraction of that.

    The logic diagram is shown below and you can see that the output stage is interesting in that it contains a latch.

    The output latch appears to be some kind of SR latch. I also had a go at simulating it CircuitVerse and also in Falsted.

    I won't use DM85S68  immediately as I'd need to create a verilog model for it - but it's definitely on the todo list for a variation.

    You can find the DM85S68 datasheet here 

    https://www.datasheetarchive.com/pdf/download.php?id=96fbc3fe56bdaed595c8bb5f81c37e1b016adc&type=M&term=DM85S68  and also some original design ideas in this old data book  http://www.bitsavers.org/components/national/_dataBooks/1978_National_Memory_Applications_Handbook.pdf

  • ALU Design finally completed and tested

    John Lonergan07/25/2020 at 23:01 0 comments

    I finally got to the end of desiging the ALU - the result is documented here https://github.com/Johnlon/spam-1/blob/master/docs/alu_with_carry_in.md

    0-7 ALU Ops8-15 ALU Ops16-23 ALU Ops24-31 ALU Ops
    0B-1A*B (high bits)A RRC B
    AA+B*1A*B (low bits)A AND B
    BA-B*1A/BA OR B
    -AB-A*1A%BA XOR B
    -BA-B (special)A << BA NAND B
    BA / 10A+B+1*2A >> B arithmeticNOT B
    BA % 10A-B-1*2A >> B logicalA+B (BCD)
    B+1B-A-1*2A RLC BA-B (BCD)

    This ALU is based on Warren Toomey's ALU for CscVon8 but with a few significant differences that are detailed in the ALU design page for SPAM-1.

    There is a verilog implementation of the ALU https://github.com/Johnlon/spam-1/blob/master/verilog/alu/alu_code.v and a bunch of unit tests for each operation https://github.com/Johnlon/spam-1/blob/master/verilog/alu/test.v.

    Once I had that passing I needed to generated an image file for burning a ROM and also because I wanted to use the same data file to drive an alternative impl using a verilog ROM loaded from that datafile so that I could run the CPU and unit test against it.

    The approach I took was to create the data file by writing a small verilog program that applied all possible input values to the verilog ALU implentation and then writing the inputs and outputs to disk as a ROM image.

    The program was pretty simple https://github.com/Johnlon/spam-1/blob/master/verilog/alu/gen_alu.v and generated all the files I need to create some physical ROMS when I get to the H/W build - which will be as soon as I can clear a space at home to start work.

  • Single Cycle CPU Confusion

    John Lonergan07/19/2020 at 23:53 0 comments

    I was writing up some design/research notes on timing considerations and I was considering whether it was possible to update async RAM in a single cycle.

    I have heard the term "Single Cycle Cpu" and was trying to understand what single cycle cpu actually meant. Is there a clear definition and consensus and what is means?

    The home brew "single cycle cpu's" I've come across seem to use both the rising and the falling edges of the clock to complete a single instruction. Typically, the rising edge acts as fetch/decode and the falling edge as execute. 

    However, in my reading I came across the reasonable point made here ...  https://zipcpu.com/blog/2017/08/21/rules-for-newbies.html    

         "Do not transition on any negative (falling) edges. Falling edge clocks should be considered a violation of the one clock principle, as they act like separate clocks.".

    This rings true to me. Changing state on the rising and falling edges (or high and low phases) is effectively the same as changing state on the rising edge of two cycles of a clock that's running twice as fast; and this would be a "two cycle" CPU wouldn't it. 

    So is it honest to state that a design is a single cycle CPU when both the rising and falling edges are actively used for state change?

    It would seem that a true single cycle cpu must perform all state changing operations on a single clock edge of a single clock cycle.

    I can imagine such a thing is possible providing the data strorage is all synchronous. If we have a synchronous system that has settled then on the next clock edge we can clock the results into a synchronous data store and simultaneously clock the program counter on to the next address. But if the data store is async then the control lines would be changing whilst that data is being stored leading to unintended behaviours.

    Am I wrong, are there any examples of such that include async storage in the mix?

    It would seem that using async RAM in ones design means one has at least "two clock cycles".

    Of course, with some more complexity one could perhaps add an extra cycle when accessing async data strorage, but again that still wouldn't be a single cycle cpu, rather a mostly single cycle cpu.

    So is there a commonly accepted single cycle CPU and are we applying the term consistently? 

  • Single Cycle?

    John Lonergan07/15/2020 at 01:37 0 comments

    It seems that with a little thought I should be able to flip the impl into a single cycle design. The sticking point is that direct addressing the ROM means two cycles; one to load the instruction from the ROM and a second cycle to use the instruction to direct address the ROM and execute. 

    It's pretty clear that it cannot be done.

    So if I wanted to move to single cycle then I need to entirely separate program memory from data memory. The current design allows direct addressing either the RAM or the ROM so the ROM is multipurposed and this is where the issue lies. 

    If alternatively I entirely separate the program memory and data then the ROM is used only to provide instructions and the direct addressing applies only to RAM. 

    If I have lookup tables in ROM that I need for calcs or whatever then they would need to be copied to RAM, which I can do using immediate addressing to supply the data. 

    Restrictng direct addressing to RAM only means I can ditch the instruction registers. 

    Single cycle avoids the complexity arount the multi cycle, 3 phased approach I currently have.

    Have to try the Verilog simulation and see what gives.

  • Micro-Cap - Pretty cool

    John Lonergan07/05/2020 at 12:06 0 comments

    Saw on the TTL'ers char group that the analysis and sim package Micro-Cap 12 is free and has a load of 74xx components in the library as well as analogue stuff, so I decided to have a look.

    I was pleased to see that components like the 74HCT4017 and even the 74HCT670 register file are in there which is excellent and surprising.

    As an experiment I built a trivial sim using the 74HCT4017 phaed clock generator and noticed that the sim was showing some behaviour that my own Verilog sim wasn't; MicroCap looked correct.

    It turned out that I'd forgotten to include the timing delays in my 74HCT4017 verilog model. So I'll probably spend a little more time messing with MicroCap, though I don't know if I have the stamina for a full blown sim.

    Definitiely worth a look though (interface takes a bit of getting used to).

  • The musical box lives!

    John Lonergan06/26/2020 at 01:33 0 comments

    I have now reorganised the simulation so that I have two competing implementations of the control logic. 

    • The original one with complicated decoder logic and minimal ROMs and consequently multiple instruction types, 
    • and also a new (or rather old) approach with a horizontal instruction encoding scheme rather similar to the original design except with many more control lines and with a minimal amount of trivial decoder (74139/74139)  logic.

    There is a large net saving in control logic chips as expected, at the expense of spreading the instruction over 48 bits of rom. Of course CPU's like the MIPS use narrower instructions but my approach means no tricky decode logic; I can live with that.

    A side effect of this redesign is 

    • there is now only one instruction really "DEVA=DEVB (ALUOP) DEVC" , where devices A/B/C can be any of the devices on onboard and direct vs. register addressing of the RAM or ROM is orthogonal. BTW Right from the opriginal design the jumps are just special devices that conditionally accept the update from the ALU or not depending on status flags.
    • it enables  direct addressing of the RAM or ROM simulataneously with use of an immediate in the instruction 
      • "RAM[DIRECT ADDRESS]= DEVICE (+) IMMEDCONST"
      • DEVICEA = RAM[DIRECT ADDRESS] (+) DEVICEB
      • and so on

    This approach doesn't use all the bits in the ROM's. There are few unused bits in the middle and I'm considering whether these might be used to implement conditional instructions like the early ARM chips. Possibly as an alternative to the more common jump approach I'm using already. So room for experimentaiton too.

    So I think it's worth it and I'll stash the old more complex control logic approach.

  • More is Less

    John Lonergan06/19/2020 at 22:24 0 comments

    A theme with this project has been it's ever increasing complexity.

    In the current design I'd decided 24 bits of instruction was enough to deal with, but this means that I need complex decoding logic in the control unit to multplex bits in the instruction into tghe various devices; a single bit might represent a bit of an address or a constant or an ALU operation depending on the instruction type.

    Last night when updating the docs in the github project I made a comment about the horizontal encoding that I'd used in the absolutely initial design where I'd called this device "simple". 

    The comment I made in github was about horizontal encoding: "I quite like the idea that it would be rather like a hand cranked music box."

    Hand cranked music box

    Horizontal encoding is similar to a mechanical music because of the triival control logic; the musical box has a trivial system of tuned prongs and a simple set of spikes on a programmed wheel that plucked those individual prongs.

    Today, on the way back from a rare trip to the shops (Covid 19 etc) I was reflecting on the complexity of the control logic that I had. I did a quick mental; calculation of how many ROMs it would take if I went back to a stricter horizontal encoding with each control wire hooked to a specific output bit of a ROMs; no decoders nothing. To replace all that logic would take nine roms. Arguably, that is actually a fair trade off by introducing simplicity of for the sake of a few extra ROM's. Swapping a lot of little chips and wires for a few larger chip.s

    By the time I'd gotten home, I'd gone on to calculate how many ROM's it would take to represent every control wire assuming I was to going to use nothing more than a single layer of decoders. This left me with 6 ROMS plus a few decoder chips.

    I found myself realising that the home brew CPU designs that I've seen seem to be avoiding storing the instruction across multiple ROMS. I see 24 or 32 bit instructions out there but these tend to in a single ROM and loaded into instruction registers over a sequence of clock pulses to achieve the width. 

    I could do the same of course. I could use a single 27C40001 to hold all the data and load 6 instruction registers over 6 clock cycles. But, these ROM devices are slow so that approach would be slow and anyway I'd still end up with a similar amount of wiring hassle as using using a bunch of ROM's in parallel.

    So I'm edging towards a rework where I either go with ROM-only horizontal approach using 8 or 9 ROMs, vs ,  a minimal decoding approach using just 6 ROMS.

    ... to be continued ....

  • ALU Optimisations - final amendment hopefully

    John Lonergan06/19/2020 at 00:46 0 comments

    Updated the ALU operations to allow me to select whether I want the addition/subtraction ops to take carry-in into account.

    I'm also going rewrite my ALU verilog so it's actually ROM based as that's what the physical impl will be. I expect that that approach will make automated testing of the logic easier too.

    More details here: https://github.com/Johnlon/spam-1/blob/master/docs/alu_with_carry_in.md

    Updated ops are:

    0-7 ALU Ops8-15 ALU Ops16-23 ALU Ops24-31 ALU Ops
    0B-1A*B (high bits)A ROR B
    AA+B*1A*B (low bits)A AND B
    BA-B*1A/BA OR B
    -AB-A*1A%BA XOR B
    -BA-B (special)A << BNOT A
    A+1A+B+1*2A >> B arithmeticNOT B
    B+1A-B+1*2A >> B logicalA+B (BCD)
    A-1B-A+1*2A ROL BA-B (BCD)

    *1 these ops will be used if the instruction directly selects ops 9/10/11, or, when the instruction selects 13/14/15 but carry-in is not set
    *2 these ops are selected when the instruction is selecting ops 13/14/15 and carry-in is set; if carry-in is not set then see *1

View all 23 project logs

Enjoy this project?

Share

Discussions

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates