A byte-wide stripped-down version of the YGREC16 architecture

Similar projects worth following
#YGREC16 is getting pretty large and moving away from the original #AMBAP inspiration, making it less likely to be implemented within my lifetime. So here is a "back to minimalism" version with
* 256 bytes of DRAM (plus parity)
* 8 byte-wide registers
* fewer relays than the YGREC16
This core is so simple that I focus now on the debug/test access port and the register set's structure.
Like the others, it's suitable for implementation with relays, transistors, SSI TTL, FPGA and ASIC.

I give up on the idea of playing the Game of Life (the forte of #YGREC-РЭС15-bis) but I design a VHDL version because @llo sees the YGREC8 as a perfect replacement for PICs for his #SteamBot Willie !

A significant reduction of the register set's size is required so I/O must be managed differently, through specific instructions. The register map is expected to be:

  • D1  <= for NOP
  • A1
  • D2
  • A2
  • R1
  • R2
  • R3
  • PC  <= for INV

I shrunk the instruction word down to 16 bits. It is still reminiscent of the YGREC16 older brother but I had to make clear cuts... The YGREC8 is a 1R1W machine (like x86) instead of the RISCy YGREC16, to remove one field.

I have swapped the condition field and the ALU code field, which is now a more classical opcode.

20171116: The latest evolution of the instruction format has added a 9-bits immediate field address for the I/O instructions.
20180112: Imm9 is now removed again...

There are two classical instruction forms : either an IMM8 field, or a source & condition field, combined with the destination field and a small opcode. The source field can also become a short immediate field (3 bits only but essential for conditional short jumps or increments/decrements).

The opcode field has 4 bits and the following values:

Logic group :

  • OR  => Reg OR Reg does not change Reg
  • XOR
  • AND
  • ANDN

Arithmetic group:

  • CMPU
  • CMPS
  • SUB
  • ADD

Beware : There is no point to ADD 0, so ADD with short immediate (Imm3) will skip the value 0 and the range is from -4 to -1 and +1 to +4. (see 17. Basic assembly programming idioms)

Shift group (optional)

  • SHR
  • SHL
  • SAR
  • ROL

Control group:

The COND field has 4 bits, more than YGREC16, so we can add more direct binary input signals. CALL is moved to the opcodes so one more code is available.  All conditions can be negated so we have :

  • Always
  • Z (Zero, all bits cleared)
  • C (Carry)
  • S  (Sign, MSB)
  • B0, B1, B2, B3 (input signals)

Instruction code 0000h should map to NOP, and the NEVER condition. (???)

Instruction code FFFFh should map to INV, which traps or reboots the CPU (through the overlay mechanism) : condition is implicitly ALWAYS because it's a IMM8 format : CALL PC FFh (thus rebooting/alerting with some code placed there, if any, otherwise keep instruction at FFh equal to INV to make an endless loop)

Overall, it's still orthogonal and very simple to decode, despite the added complexity of dealing with 1R1W code.

1. Honey, I forgot the MOV
2. Small progress
3. Breakpoints !
4. The YGREC debug system
5. YGREC in VHDL, ALU redesign
6. ALU in VHDL, day 2
7. Programming the YGREC8
8. And a shifter, and a register set...
9. I/O registers
10. Timer(s)
11. Structure update
12. Instruction cycle counter
13. First synthesis
14. Coloration syntaxique pour Nano
15. Assembly language and syntax
16. Inspect and control the core
17. Basic assembly programming idioms
18. Constant tables in program space
19. Trap/Interrupt vector table
20. Automated upload of overlays into program memory
21. Making room for another instruction
22. Opcode map
23. Sequencing the core
24. Synchronous Serial Debugging
25. MUX trees
26. Flags, PC and IO ports
27. Binary translation
28. Even better register set
29. A better relay-based MUX64
30. Register set again
31. Rename that opcode !
32. Register set again again
33. Yet Another Fork
34. What can it run ?


Core diagram in SVG, added LDCx MUXes

svg+xml - 17.96 kB - 01/17/2018 at 17:38


svg+xml - 6.99 kB - 01/12/2018 at 18:57



Added: license, readme, mustfail...

x-compressed-tar - 36.61 kB - 12/08/2017 at 23:21



Coloration syntaxique pour l'├ęditeur de texte Nano

nanorc - 1.16 kB - 12/08/2017 at 14:43



How the YGREC8 is split and controlled for debug, development and test

svg+xml - 8.55 kB - 12/03/2017 at 16:26


View all 14 files

  • What can it run ?

    Yann Guidon / YGDES08/16/2018 at 12:34 0 comments

    #YGREC-РЭС15-bis has 16 bits wide registers that make it suitable for quite a few things, including running Tetris and Game of Life. However, the #YGREC8 is only 8 bits wide and this limits the range of programs even more. I'll focus only on "toy games" because they are the most attractive applications, while I also consider other uses such as PLC or monitoring.

    Tetris is still somewhat possible but it would be an impractical stretch because the 10 columns exceed the 8 bits of the registers, and the processor is too slow to animate that smoothly : the display would be sheared.

    Tic-tac-toe is a contender.

    Battleship is another good candidate : it's not a hard real-time game and animations are not critical. However I would have to build 2 units and make them communicate somehow... So it would be good to develop communication protocols, later.

    Another good challenge is the SNAKE game. It doesn't require too much computing power and could run fast enough to be enjoyable. The problem is to memorise a linked list of coordinates, which could exceed the DRAM capacity... But there is a not-too-hard solution :-)

    It requires 4 bitplanes :

    • one bitplane is a boolean that says "food"
    • one bitplane is a boolean that says "snake" (can be mapped to the flip dots display)
    • one bitplane says "up/down"
    • one bitplane says "left/right"

    so overall, there are 4 bits per pixel. With a 16×16 pixels array, that's 128 byte of DRAM, or half the addressing space of one address register.

    • Food is pretty simple : it's set by a random generator from time to time in places where there is no "snake" bit set.
    • Snake is used as a collision condition, it's set by the "head" code and cleared by the "tail" code.
    • The "head code" has a coordinate and a direction : the direction is changed by the button inputs.
      - At each game step, the buttons are scanned, the direction updated, the direction increments/decrements the coordinate and the "snake" bit is set at the new coordinate.
      - If a "food" bit is also present, the food is swallowed (cleared) and a new food is created pseudo-randomly. The tail code is skipped for one cycle, so the snake gets longer.
      - if a wall is touched or the snake bit is already set, game over.
    • The trick is with the tail code :-) Each "head" leaves a sort of "trail" on the "left/right" and "up/down" bitplanes so the tail can follow it, without requiring the storage of a long list of coordinates. It uses quite a lot of room but much less than fully-decoded coordinates. So the "tail code" remembers the coordinates of the tail but instead of reading the buttons, it reads the "trail" left by the head to follow the body.

    It shouldn't be too hard, right ?...

    I have a 16×24 flip dots array that leaves 16×8 pixels to display the stats of the game.

  • Yet Another Fork

    Yann Guidon / YGDES07/16/2018 at 17:29 0 comments

    Now that the instruction set is fixed, the architecture is elaborated and the #Numitron Hexadecimal display module  is working, what about making the front panel assembler ? By a nice coincidence, it's the time of the year for the Human-Computer Interface Challenge! So let's fork Y8 and focus on a well-defined sub-project :-)

    #Hardware assembler 

  • Register set again again

    Yann Guidon / YGDES03/18/2018 at 10:09 0 comments

    I know the title is lousy but the previous log 30. Register set again   was missing illustrations so here they are.

    The most basic unit is a set of 2 bits of storage (DFF or TL) and 2 MUX2.

    Nothing fancy here but the 2×2 tile is copied/mirored and 2 more MUX2 are added :

    YGREC8 has 8 registers so the 5×2 tile is copy/mirored once again :

    It might look messy so let's not forget that many wires are shared, here are some colors to better visualise the wires' functions :

    it looks almost like an ASIC pre-layout and indeed routing is quite easy, some gates simply need to be moved around.

    The above 11x2 tile is a "slice" of one bit, and 3 are tied together to make a group. The MUX2s are in 3 groups of 7 each but I'm not sure which organisation is best. In the pitures below, each color represent one address bit.




    The b) version seem to have a small advantage because red and green are a bit les wide, but the blue till spans the whole width. Maybe the best approach is the one that requires the least wire crossings for the overall set.


  • Rename that opcode !

    Yann Guidon / YGDES03/11/2018 at 10:23 5 comments

    I remember when I first tried to understand a microprocessor. I had a book in french that explored the 6809 and I was yound and impressed. But I could't wrap my head around the concept of the MOV opcode. Does it displace data ? And what happens to the original data ?

    I have since acquired the habit of using MOV, mostly from my heavy use of x86 asm. But looking back at that early confusion, and despite the almost universal use of his mnemonic, I believe it's time to do the "right thing" : rename it to CP.

    PS : somewhat related to 1. Honey, I forgot the MOV


    OK I think I got it now ! CP is not great in the case of immediate values but SET is much better :-)

  • Register set again

    Yann Guidon / YGDES03/07/2018 at 08:52 0 comments

    The pursuit of the Ultimate Register Set Structure progresses. I'm trying to make it more hierarchical and practical for a wider range of technologies (ASIC, FPGA, transistors, TLL, transistors).

    I decided to use a parity bit for the register set and the memory. This increases reliability and the 9th bit is already provided by the A3P FPGA anyway. I'm also settling with a 512 bytes addressing space, whenever I can, to prevent aliasing issues (but the mapping can be controlled by some bits in the IO space at address 0)

    The redesign of the register set uses bit slices again. 3 slices are grouped and 3 groups make the 9-bits wide register set. This is near perfect from the fanout point of view and the structure is very easy to place and route.

    Parity is in bit #4 to reduce wire lengths in FPGA and ASIC.

    Each slice has 8 bits of addressable storage and two MUX8.

    The two MUX8 can be either balanced (fan-in={1,3,3}) or not (the classical {1,2,4}), it doesn't make a difference. There will be a fan-in of 7 in each group of 3 slices for all 8 address wires, when using circular permutation.

    The storage part has more variations and options, depending on the technology.

    For FPGA the bits are made of DFF with enable. The clock must feed all 72 bits and the enable signal is split into 8 lanes, one for each register. No reset signal is required (despite complaints from the synthesiser). It's possible to go further by removing the Enable signal : the clock signal is split into 8 lanes, so yes, that's "clock gating"...

    Even further : a DFF is made from a couple of latches clocked on opposite signals. The first latch of each bit in a lane can be "factored" to reduce parts count in a discrete system. Instead of 16 latches to store 8 bits, only 9 remain (we saved almost one half of the parts !) which is good for TTL, transistors, ASIC... but clock sequencing is more complex. This approach is a bit slower but also saves power because the clock gating reduces the activity on the clock network by a 8:1 ratio.

    3 slices make a group where the control lines get a circular permutation to balance the load on the control lines. However, the 8 "enable" lanes would become all shuffled (and prove hard to route) if all the MUX8 are shuffled, so each of the slices must be routed correctly from the MUX8s to keep the right order of the latches.

    The groups have a fan-in of 1 for each signal (except data input if there is a direct connection to the DFF). The 2×3 MUX8 driving lines get amplified by one buffer each.

    On A3P, each group has a XOR3 at the data input to generate parity.

    Then at the higher level, 3 groups are assembled to create a 9-bits register set. The fan-in of the MUX8 is only 3. For other technologies, the 8 data input bits are parity-ed with a tree of XOR2 and the result is placed in the middle slice. The 8 latch enable lanes should be "straight" and easy to route.

    Two other parity checks should be implemented at the output ports.

  • A better relay-based MUX64

    Yann Guidon / YGDES03/02/2018 at 08:56 0 comments

    I came up with a different system for the MUX64 (required by the memory system) that doesn't use the CCPBRL system :

    It uses full on/off switching instead of constant biasing so it might be less sensitive to individual drift in characteristics. This means less binning. However, there could be one side that is more ON than the other and heat more.

    There is a big trick as well : the capacitor replaces freewheeling diodes to "precharge" the opposite branch when the relay switches to the other side. The question of the capacitance is important because I doubt that 100nF will be enough and the 100µF capacitors are polarised, they would be destroyed...

    I have to evaluate the pros & cons of this method versus the CCPBRL one. For example, CCPBRL has only static/medium current and homogeneous/distributed heat but requires another higher-voltage power supply rail and requires very precise power supply regulation.

  • Even better register set

    Yann Guidon / YGDES02/27/2018 at 11:41 0 comments

    I think I cracked it :-)

    The MUX8 are all identical and a circular permutation controls 7 bits. The last bit has a different permutation to reach the ideal fanout of the gates. Hopefully this will let me make a better register set, both with relays (easier construction) and with VHDL (shorter, more generic code).

    Better :

    I'm just trying to reduce the length of the wires and the long crossings :-)

    Oh, that's even better :

    The sequence of permutations is :

    I now have to rewrite my register set VHDL code...

  • Binary translation

    Yann Guidon / YGDES02/20/2018 at 01:46 0 comments

    One thing I've been thinking about : since the YGREC8 is a sort of subset of the YASEP ISA, wouldn't it be nice and easy to emulate the YGREC8 on the YASEP with a pipeline stage that performs binary translation of the YGREC8 instructions ?

  • Flags, PC, IO ports and interrupts

    Yann Guidon / YGDES02/15/2018 at 04:12 0 comments

    Interrupt handling should be seriously considered because we'll need them one day or another. This means that the complete state of the core must be saved and restored by suitable hardware and software.

    • A first issue is how to save the flags (C, S and Z). Attempting to save them by conditional instructions will destroy their values... Upon a IRQ signal, they would be saved to 3 backup bits which can be read and written by an IO port (port 0 ?). Exit from IRQ (and restoration of the IRQ mask) would occur when writing the value back to the port... or something like that.
      Cost : 3 DFF with enable, 3 MUX for the feedback to the flags, 3 MUX to select where the DFF input comes from, and some glue logic.
    • A second issue is to save the PC. Actually, it's PC+1 that must be saved (after LDC has completed). Again : the value can be saved to a IO port (port #1 ?) where it can be read and written (with the proper MUXes). 
    • A 3rd issue is that some scratch space is required to save a couple of registers (such as the address registers) to allow memory to be used to save the other registers. At least, ONE backup is required, probably A1, it can be automatic (like PC and Flags) but it's not required, the Interrupt Service Routine can start with OUT A1 2 for example (2 being the scratch register's address). If more scratch registers are provided (let's say 2 or 3) then very short ISR can be written, with no need to touch memory. However, memory is the main channel of communication between threads so a compromise is 2 scratch registers.

    Overall, this means that the very first IO port addresses are reserved for core functions. There are 4 registers that can be written from the core's internal state, as provided by the entity's ports (PC+1, A1, A2 and flags are available outside of the datapath because they are required for the debug system). So far the map is :

    • 00h : Flags (C, S, Z) and Interrupt control backup register. Values come from IO write port or core. Setting bit 0 triggers restoration of the previous states (sort of "return from IRQ"). Bit 1 would enable/disable the IRQ mask.
      20180307: Two other bits control the mapping of memory banks for A1 and A2. These 8 bits are almost fully used.
    • 01h : PC backup : value comes from IO port or core (this allows IRQ re-entrance)
    • 02h : ScratchRegister1 : copied from A1 or from IO write port (result bus), used only by the ISR.
    • 03h : ScratchRegister2 : copied from A2 or from IO write port (result bus), used only by the ISR.

    Saving A1 and A2 directly with dedicated hardware saves one or two cycles of latency and some precious bytes) when servicing IRQs but can also make the core harder to route... So they might be simple registers (which saves a MUX as well as the required wires). Or they can be "shadow" registers, written everytime the corresponding A register is being written (but the value goes through the RESULT bus, while the OUT bus is connected to DST, so it's awkward and would increase the overall electrical activity of the circuit, which is less good for power draw).

    One nice side-effect is : this avoids creating an opcode for RTI (ReTurn from Interrupt) because it is detected by the following conditions : OPCODE=OUT (5 bits), IMM8=0 (8 bits), and DST[0]=1 (1 bit). 14 bits are easy to check in the pipeline.

    The other nice aspect is that this mechanism is entirely optional : it can be disabled/removed if IRQs are not supported by the core.

  • MUX trees

    Yann Guidon / YGDES02/05/2018 at 15:47 0 comments

    At this moment I work on a more formal code for the MUX parts. In other words I'm digging again in a pet topology project. This makes the VHDL code better, because I realise I use MUX8 in various places yet I don't get the best out of them. For example, even though I built the Register Set out of balanced control trees, I didn't use this technique for the conditions. So I started writing MUX8 components in VHDL... I haven't uploaded the new code archive but when I do, look at MUX8.vhdl. I should also rewrite the REG8 module by using these enhanced MUX8.

    The next step is the large MUX64 used by the serial debug system (see 24. Synchronous Serial Debugging). I'd like to design it algorithmically but I haven't cracked yet the algorithm. Is there a simple one ?

    20180227 : algorithm cracking in progress. Meanwhile, I already have one topology/solution for MUX64 :

    It's going to be fun to write this in VHDL...

View all 34 project logs

Enjoy this project?



Bartosz wrote 11/08/2017 at 16:40 point

this will working on epiphany or oHm or other cheap machine?

  Are you sure? yes | no

Yann Guidon / YGDES wrote 11/08/2017 at 18:07 point

I'm preparing a version that would hopefully use less than half of a A3P060 FPGA, which is already the smallest of that family that can reasonably implement a microcontroller.

But it's a lot less fun than making one with hundreds of SPDT relays !

  Are you sure? yes | no

Bartosz wrote 11/14/2017 at 14:13 point

Question is price and posibility to buy

  Are you sure? yes | no

Yann Guidon / YGDES wrote 11/14/2017 at 16:08 point

@Bartosz : what do you want to buy ?

If you can simulate and/or synthesise VHDL, the source code is being developed and available for free, though I can't support all FPGA vendors.

If you want a ready-made FPGA board, that could be made too.

If you want relays, it's a bit more tricky ;-)

I have just enough RES15 to make my project and it might take a long while to succeed. There will be many PCB and other stuff.

However if, in the end, I see strong interest from potential buyers, I might make a cost-reduced version with easily-found minirelays. I don't remember well but the Chinese models I found cost around 1/2$ a piece. Factor in PCB and other costs and you get a very rough price estimate... It's not cheap, it's not power efficient, it's slow and won't compute useful stuff... But it certainly can make a crazy nice interactive display, when coupled with flip dots :-D

So the answer is : "it depends" :-D

  Are you sure? yes | no

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates