A byte-wide stripped-down version of the YGREC16 architecture

Similar projects worth following
#YGREC16 is getting pretty large and moving away from the original #AMBAP inspiration, making it less likely to be implemented within my lifetime. So here is a "back to minimalism" version with
* 256 bytes of DRAM (plus one parity)
* 8 byte-wide registers
* less relays than the YGREC16
and that's pretty much it.

I give up on the idea of playing the Game of Life (the forte of #YGREC-РЭС15-bis) but I design a VHDL version because @llo sees the YGREC8 as a perfect replacement for PICs for his #SteamBot Willie !

A significant reduction of the register set's size is required so I/O must be managed differently, not through the register set (instruction or RAM-mapped, to be determined). The register map is expected to be:

  • D1  <= for NOP
  • A1
  • D2
  • A2
  • R1
  • R2
  • R3
  • PC  <= for INV

I shrunk the instruction word down to 16 bits. It is still reminiscent of the YGREC16 older brother but I had to make clear cuts... The YGREC8 is a 1R1W machine (like x86) instead of the RISCy YGREC16, to remove one field.

I have swapped the condition field and the ALU code field, which is now a more classical opcode.

The latest evolution of the instruction format has added a 9-bits immediate field address for the I/O instructions.

There are two more classical instruction forms : either an IMM8 field, or a source & condition field, combined with the destination field and a small opcode. The source field can also become a short immediate field (3 bits only but essential for conditional short jumps or increments/decrements).

The opcode field has 4 bits and the following values:

Logic group :

  • OR  => Reg OR Reg does not change Reg
  • XOR
  • AND
  • ANDN

Arithmetic group:

  • CMPU
  • CMPS
  • SUB
  • ADD

Shift group (optional)

  • SHL
  • SHR
  • SAR
  • ROL (or ROR ?)

Control group:

The COND field has 4 bits, more than YGREC16, so we can add more direct binary input signals. CALL is moved to the opcodes so one more code is available.  All conditions can be negated so we have :

  • Always
  • Z (Zero, all bits cleared)
  • C (Carry)
  • S  (Sign, MSB)
  • B0, B1, B2, B3 (input signals)

Instruction code 0000h should map to NOP, and the NEVER condition.

Instruction code FFFFh should map to INV, which traps or reboots the CPU : condition is implicitly ALWAYS because it's a IMM8 format : MOV FFh PC (thus rebooting/alerting with some code placed there, if any, otherwise keep instruction at FFh equal to INV to make an endless loop)

Overall, it's still orthogonal and very simple to decode, despite the added complexity of dealing with 1R1W code.

1. Honey, I forgot the MOV
2. Small progress
3. Breakpoints !
4. The YGREC debug system
5. YGREC in VHDL, ALU redesign
6. ALU in VHDL, day 2
7. Programming the YGREC8   
8. And a shifter, and a register set...
9. I/O registers
10. Timer(s)
11. Structure update


Starting to code the assembler.

x-compressed-tar - 19.08 kB - 11/17/2017 at 09:51



Executes its first instructions, and other enahncements

x-compressed-tar - 16.41 kB - 11/16/2017 at 05:28



Added some BRAM blocks and ... a core ! (it compiles but need to be tested now)

x-compressed-tar - 14.65 kB - 11/14/2017 at 06:41



much better integration now.

x-compressed-tar - 9.38 kB - 11/13/2017 at 02:19




x-compressed-tar - 7.53 kB - 11/12/2017 at 05:22


View all 7 files

  • Structure update

    Yann Guidon / YGDES4 days ago 0 comments

    I found that I made a few errors in my precedent diagram and I have updated several aspects. Here is only an early draft, until @llo  makes a vector version.

    • The Imm8 field is now integrated at the right stage of the datapath.
    • I/O ports are also integrated, according to the latest developments.
    • The Z/S/C flags now have their own write-enabled registers.
    • NPC appeared, because the A3P SRAM blocks have their own latches already. It might also be useful for debug.
    • The R/I3 condition is not right, it should be : R/I8 or (R/I3 and /R/I8)
    • I added the SHL unit
    • D1 and D2 are now explicitly outside of the core, stored in the SRAMblock buffers.

    The condition block on the left is not well laid out but hopefully @llo  will make it better :-D

    Back to VHDL now...

    And I updated the instruction format diagram :

    yup the condition Negation flag has moved... I don't know why. But it's coherent with the VHDL code.

  • Timer(s)

    Yann Guidon / YGDES6 days ago 0 comments

    Beyond the typical GPIO, peripherals include serial transmission and timers for periodical interruption generation. These both deal with generation of a programmable frequency, and the #ProAsic3-Stamp  has a pretty fast 50MHz clock source : the quartz frequency must be divided.

    This creates the need of high speed counters, which require at least 16 bits : 50MHz/2^16 = 762.93Hz. One timer is enough for feeding the serial port but longer periods require another timer, cascaded with the first one.

    I have created another custom incrementer : INC16 is derived from the one used by PC. I extended it to 16 bits and I optimised it for speed, but the really interesting part is that I get access to the carry chain. This chain provides pulses that can then be tapped into for various purposes: I'm thinking about a binary predivider to feed other timers or clock sinks. The delicate part is how to avoid having another fanout, which drives a MUX or something like that.

    A pretty unusual feature of the timers is they count up, not down. So if you want to generate a pulse after N cycles, you have to load the count register with 65536-N. This is easily calculated by negating all the bits during load, for example. Readback might give confusing values so I wonder if it's worth providing the feature.

  • I/O registers

    Yann Guidon / YGDES6 days ago 0 comments

    The last opcodes that are not yet well defined are IN and OUT.

    This is a necessary step now because I'm now doing all the sub-modules and this one too is necessary before I put everything together.

    The annoying bit is that I have only 2 opcodes and 12 bits with already predefined fields. I made a few compromises but I hope that they are flexible enough.

    One interesting aspect is that the values and address don't use the same paths, because they go to different busses and are decoded differently. There is no real "bus" with a common address path and bidirectional data lane. This brings some flexibility.

    Another concern is the latency : it's possible to decode maybe 16 registers but more will increase the CDP length. However, limiting the address range to 16 I/O ports would be a big mistake... I have found a way to get 256 addresses, in direct and indexed mode. Well, only 128 write registers only, though, in direct mode, but it must be enough, right ?

    IN :

    • The SRC operand (REG/Imm3/Imm8) gives the I/O register address, giving an addressing range of 256 read registers in read-only, in immediate or register mode. However the latency might not allow more than 8 registers in A3P. In Imm8 mode, there might be enough slack for 16 or 32 registers. Imm8 is congruent to Imm3 so it's out of range of Imm3.
    • The DST field is used normally, with the number of the core register to write.
    • The instruction is conditional in Reg/Imm3 mode.

    OUT :

    • The SRC field (REG/Imm3/Imm8) gives the value to write to the I/O register. Imm3 mode gives -4/3 range (good for small fields, clearing or filling), and Reg/Imm8 mode gives the full 0-255 range.
    • The DST field gives the address of the I/O register to write to. This is limited to 0-7 range only. It is extended to 0-127 by confiscating the 4 condition bits in the Reg/Imm3 mode.
    • Yes, OUT breaks the cherished orthogonality dogma and it's a bit of a kludge but the loss of conditionality should not be severe, compared to the other aspects. And 128 output registers should be enough for a 8-bitter.

    The first implementation would probably implement only 8 registers, which is enough for a dumb application and even overkill for a LED blinker. However, add timers and other peripherals and that number will be really too small very fast.

    4 I/O registers are reserved :

    • Register n°0 is an index register, to extend addressing to 255 registers, read AND write.
    • Register n°1 is a data register, for read and write.
    • Register n°2 is an "offset" register used when using the 0-7 range
    • Register n°3 is reserved and unallocated. Maybe a data register that auto-increments Reg0 upon every access ?

    Not only does this extend the range for writes but it also allows scratchpad areas and I/O configuration with a code loop, instead of wasting precious code space.

    Register IO2 is a more practical trick for selecting a group of 4 registers at once. It's useful for small/medium peripherals, a setup sequence uses just one instruction to select the group, and at most 4 more instructions to change the settings. It might be a way to extend to a ridiculous 1024 registers but only the required bits are implemented so don't play with the MSB.

    Well, this is getting too complex...

    The above system creates too many problems so I have to modify it.

    First, the same address bus should be used for IN and OUT : this reduces the complexity, fanout, slowness etc. Which means that IN and OUT instructions have the same format.

    There is also the need to get the address as early as possible during the instruction cycle. Going through the register set is not practical so only immediate addresses are allowed. Imm8 becomes the only source of port addresses (great for keeping the gates count low). This leaves one unused bit : R/I8 is now obsolete, it could...

    Read more »

  • And a shifter, and a register set...

    Yann Guidon / YGDES7 days ago 0 comments

    The ALU is tested and other units have been written.

    I have implemented a small barrel shifter/rotater, with 2 versions, one is smaller, the other is faster. Now all 12 ALU opcodes are covered.

    I have also implemented the MUX-based register set, with 2 versions : the first is a classical tree (like I implemented for the #Numitron Hexadecimal display module ) with a fanout of 8-24 for the address lines. The second is more sophisticated, with 3 interleaved trees, with a fanout of 20-18-18, close to the theoretical 18.6 !

    All units have their own small testbench in their own directory. I will write the rest very soon.

    And now, an incrementer. I'm moving to the I/O system.

  • Programming the YGREC8

    Yann Guidon / YGDES11/10/2017 at 09:26 0 comments

      Here, I'm giving a little crash course about how the YGREC8's instructions are translated from text to binary. Or, if you prefer, how the instructions are assembled. It's a mix of old and new and I favour simplicity and coherence, so the code is easy to understand and write. Fortunately, it is also very easy to assemble !

      Let's start with how the instructions are represented in binary, with the diagram from the front page:

      The leftmost fields represent the least significant bits so the picture should be mirrored instead.

      Then you can construct or write an instruction by following the diagram, reading it from left to right :

      1. Start with the opcode, one out of 16 possible words.
        For example : ADD, for addition.
        The opcodes are grouped in 4 types: some perform boolean operations (OR, XOR, AND and ANDN), some perform arithmetic operations (ADD, SUB, from which CMPU and CMPS are derived), some move bits around (SHL, SHR, SAR, ROR), and the remaining opcodes move bytes around (IN, OUT, CALL, MOV).
      2. The opcode is followed by the name of a destination register. This is also one of the operands of arithmetic and logic operations.
        For example : "ADD R1" will use the put the result of an addition to the register R1, however one operand is missing.
      3. Then the operand can be a "long" immediate : Imm8 ranges from -128 to +127.
        For example : "ADD R1, 123" will add 123 to the register called R1.
        Another possibility is to use the short immediate, Imm3, ranging from -4 to +3.
        The last possibility is to use another register as operand : ADD R1, R2 will add R2 to R1.
      4. Eventually, in the last two cases, you can tell the processor to abort the operation if a given condition is met (or not). Typically, you can test if the Carry flag is set (or not), if the last result was Zero (or not) or its sign (Negative op Positive). The corresponding optional conditions are : IFZ, IFNZ, IFC, IFB/IFNC (the same), IFP, IFN. Other conditions are possible and will be defined later, the current names are IF0, IF1, IF2, IF3, IFN0, IFN1, IFN2, IFN3.
        For example : ADD R1 1 IFC will add 1 to R1 if the carry flag was set.

      So yes, if you want to write an instruction, just follow the above diagram and choose the branch that corresponds to your needs and constraints. The 3 possible combinations of operands should be enough to solve most programming problems. That, and the way the registers work and are used.

      On top of that, the idea is the same, for writing assembly language in text, or when using the "hardware assembler" of the front panel !


  • ALU in VHDL, day 2

    Yann Guidon / YGDES11/10/2017 at 06:03 0 comments

    Against all odds, the first attempt at a VHDL ALU worked the very first time. I have not found any error in the logic and adder code, can you believe it ?

    I must admit that the experiences with the YASEP and its long evolution made writing this ALU a breeze. I was prepared for many of the usual "considerations" to take into account and I looked a bit at for some old tricks. However the Carry LookAhead logic is a first for me. In case this implementation is slower than standard/custom implementations, I'll make a version with the standard "+" operator, users will try both and keep the fastest one.

    There are some changes today : there are two result outputs, one for the ROP2 and the other for the ADD. This is because they have wildly differing latencies and the rest of the core must do the proper MUXing. This will depend on other parameters that are outside of the ALU. The Z and S flags are outside, as well.

    There is another "idiosynchrasy" : the borrow bit is inverted from the carry bit, to save a XOR in the Critical DataPath. This is a bit confusing at first... Like the Z flag that is 0 when the result is zero (because it's just a big OR). The Sign flag however is just a copy of the MSB, nothing weird there :-P

    Anyway : Carry occurs when Cout=1, but borrow occurs when Cout=0. And the Cin must be set for SUB/CMP to work.

  • YGREC in VHDL, ALU redesign

    Yann Guidon / YGDES11/08/2017 at 01:11 0 comments

    @llo couldn't wait to put the YGREC8 in a #ProAsic3-Stamp ! So a VHDL version seems to be required pretty fast.

    The core itself is quite trivial to write in VHDL : the datapath is a bunch of MUX. There is close to no sequencing going on, except to access memory. Actually the only part that requires care is the ALU, which is not well defined in detail.


    In the first iteration, I'll just ignore the shifter unit, though it's easy to make a 8-bits version from the 16-bits version designed for the #Discrete YASEP. We're left with the ROP2 and ADD/SUB units, plus the PASS function (required by CALL and MOV).

    ROP2/ADD is not black magic either. MOV is tricky though because it's mainly some kind of operation with DST disabled. Or I could pull the old MUX trick but to save gates, the XOR, AND and OR are used by the Generate/Propagate logic of the adder... and it's good to save gates.

    I'm trying to optimise the logic so I move the opcodes around and I get this :

    F1, F0 \ F3, F2ROP2 (00)
    ADD (01)

    The BOLD opcodes are those that use the inverter on the DST operand. I moved them so the boolean equation that triggers the DST XOR layer is : (F1 and F0) XOR F2. The resulting diagram :

    This logic covers all the ROP2 and PASS functions, as well as the initial parts of the ADD/SUB logic (the DST XOR layer and the carry/propagate gates AND and OR).

    There is still some headroom for the PASS branch to add a MUX that brings data in the pipeline from the input ports, for example.

    Now, to get the adder, some Carry LookAhead logic is required, and it's already covered in A reasonable discrete ALU. Since we use ProASIC3's 3-input "tiles", and the width is only 8 bits, it makes sense to partition the CLA into 3-bits blocks, made of some repetitive logic. According to this simulator at the CLA3 approach seems to be the fastest, which indeed is the closest to sqrt(8). The interactive script even generates a diagram:

    The carry-out is not shown but not hard to figure out: it's a simple use of the input-less MSB of the CLA2. A CLA3 is shown here:

    (Oh, do you see the typos ?)

    The equations for the CLA3 units are:

    inputs: Cin, G0, G1, G2, P0, P1, P2;
    C0 = G0 OR (Cin AND P0);
    C1 = G1 OR (Cin AND P0 AND P1)        OR (G0 AND P1);
    C2 = G2 OR (Cin AND P0 AND P1 AND P2) OR (G0 AND P1 and P2) OR (G1 AND P2);

    The single-bit adders are like a full-adder but with the Propagate and Generate outputs added:

    Inputs: Cin, A, B;
    P = A XOR B; (already computed for ROP2)
    G = A AND B; (done too)
    S = P XOR Cin;

    Here is the result so far:

    -- YGREC8/ALU8.vhdl
    -- created mer. nov.  8 07:21:19 CET 2017 by Yann Guidon (
    -- Released under the GNU AGPLv3 license
    -- This is the arithmetic and logic unit of YGREC8,
    -- performing the add/sub and ROP2 operations on 8 bits.
    -- OPCODES:
    -- F3 F2 F1 F0
    -- 0  0  0  0  OR
    -- 0  0  0  1  XOR
    -- 0  0  1  0  AND
    -- 0  0  1  1  ANDN
    -- 0  1  0  0  CMPU
    -- 0  1  0  1  CMPS
    -- 0  1  1  0  SUB
    -- 0  1  1  1  ADD
    -- 1  0  0  0
    -- 1  0  0  1
    -- 1  0  1  0
    -- 1  0  1  1
    -- 1  1  0  0
    -- 1  1  0  1
    -- 1  1  1  0  CALL (PASS)
    -- 1  1  1  1  MOV (PASS)
    Library ieee;
        use ieee.std_logic_1164.all;
    entity ALU8 is
      port( F0, F1, F2, F3, Cin : in std_logic;
        SRC, DST : in std_logic_vector(7 downto 0);
        RESULT  : out std_logic_vector(7 downto 0);
               Cout : out std_logic);
    end ALU8;
    architecture rtl of ALU8 is
      Signal  ALU_XOR, P, G, S, C,
     DSTX, Complement
    Read more »

  • The YGREC debug system

    Yann Guidon / YGDES11/06/2017 at 22:36 0 comments

    The YGREC family is pretty straight-forward. The main sequential elements of the core are the register set and the flags so the whole rest is combinatorial and the control signals directly map to the instruction word. So we can control the core simply by playing with the instruction word, even when not executing any program.

    This leads to a similarly very simple debug system that hijacks the instruction memory system, thus cutting the whole processor in half and adds an optional slice in the middle, as shown in this new diagram :

    The beauty is that it's totally modular, so far I have focused on the assembler and disassembler but other modules include the control of the instruction address, the comparators that make up the breakpoints sub-system, and the sequencer to start/step/stop the datapath.

    A finished YGREC can work without the debug modules, or any number of them. In a FPGA or ASIC, this system is also easy to implement and it provides a single point of control that can be read and written using a scan chain/shift register, using SPI or JTAG.

    The minimal internal state and the debugging information are:

    • Current instruction address (can be selected as the core's PC or a forced value)
    • Current instruction word (either coming from memory or a forced value)
    • the SRC and DST operands, as well as the RESULT value
    • the C/S/Z flags

    With all these values, as well as the ability to inject any information through the instruction bus, the whole core can be dumped and controlled.

    Note that the instruction word can be controlled and give immediate results, yet the core is halted. If the result is satisfying, the user can validate it with a "step" pulse that will record the result in the register set and the flags. So the instruction can change at will, without any effect, even in the middle of a program: this can help dump the contents of the registers or even memory regions. The whole state can be read and changed, before the program resumes its normal execution path.

    The funny side effect is that, to send "explore" the core through such a debug interface, a sequence of commands is required, which looks a lot like a "program" except without the possibility to jump or loop. Debug thus uses snippets of code that use the normal CPU assembler tool, only extended to support sequencing and debug signals. The processor is now its own debugging language !

  • Breakpoints !

    Yann Guidon / YGDES11/05/2017 at 17:53 0 comments

    Reading ... and a familiar thought resurfaced !

    I forgot the breakpoint(s).

    At least one trap on a given instruction address would be nice, right ?

    This happened before with the #Discrete YASEP  at 6. Dear SN74HC688  and other handy traps would be triggered on writing to a given register or on detection of a given value on the result bus. These trap signals would really help, not just with debugging the software, but also the hardware !

    Another thought : add a cycle counter. However the electromechanical panel versions I have found are limited to 8 to 10 pulses per second only, while I expect more than twice that speed...

    Examples found on eBay:

    Maybe a predivider would help but... the missing least significant digit would be more than welcome.

    Another eBay image from

    And this one is limited to 10 I/S as well :

    So I doubt the claim that this chinese model works at 60Hz The line "Input Power : DC 24V 50/60Hz" makes me believe it's an editing error.

    But mostly, these counters don't provide an output signal when a given count is reached so their usefulness is "only informative". A nice decorative gadget.

    Anyway, the bitwise comparison of two numbers is pretty easy with relays. First, wire the relays in series to make a large AND gate. Then complement the relays with other relays to make XORs. This amounts to 2 relays per bit. There is also a version of XOR with 1 relay per bit but it relies on both operands to use rail-to-rail signals (not just on/off).

    For ease of design and economy, the values to compare could be entered in binary with individual SPDT switches. But you know, binary is SOOOOO 50s. Hexadecimal knobs please !

    Unfortunately, my knobs are SPST. This means that one relay per bit is still required to transform the on/off signal into rail-to-rail signals. So we're back to 2 relays per bit. Convenience has won again...

  • Small progress

    Yann Guidon / YGDES11/03/2017 at 22:27 0 comments

      Things are looking good on the front of hexadecimal display. #Numitron Hexadecimal display module will soon get its first PCB module so it's time to think about how it will be used.

      @llo offered to help with the project so I created a couple of diagrams. The first shows the general organisation of the YGREC computer (either 8 or 16 bits version) and how the displays and the assembler/disassembler are used to program, step into code, examine and inspect the processor and the sytem :

      In an ideal, hypothetical computer, the middle parts (with the MUXes) are removed. The instruction bus is connected directly to the processor and the PC is tied to the instruction ROM address bus.

      The system is "cut in half" to greatly enhance the manufacturability/testing, but also the design of the software and the hardware. These boards allow us to inject signals to probe into the processor.

      For example if you want to read the value of a given register : switch the assembler panel on, change the SRC knob to the desired register, and directly observe the value displayed by the SRC display from the processor.

      There are at least 5 digital displays and these should be sufficient for programming and troubleshooting:

      1. immediate data on the disassembler panel
      2. current instruction address
      3. SRC operand
      4. DST operand
      5. RESULT value

      So on YGREC8 that's 10 hex digits, and 20 for YGREC16. I don't count the status flags (Zero, Carry, Sign).

      (now I'm waiting for @llo  to make a clean version of the diagram ;-) )

      It was also time to upload the diagram of the processor's datapath. Here is the first draft, almost complete. It misses a few details but you have the most critical features right under your eyes:

      I'm hoping to get a proper, clean, computer version (SVG or PNG) "shortly" :-)

      The datapath is pretty straight-forward : two operands are read from the registers, two immediate fields are injected (before and after the ALU) and the result is looped back to the register set.

      There is a little magic trick with PC to allow function calls : if OPCODE=CALL the destinations are swapped and the next PC is saved in a given register instead of PC. The computed address from RESULT is loaded into PC instead of the other registers, which implements a sort of Jump.

      Data memory is almost like the YGREC16 but with only 2 ports. I think the relay version will have maybe 64 bytes of DRAM only, to keep the system small and gain experience for the YGREC16 without using too many parts.

      You can see how conditions are managed : it's a simple MUX8. One input is constant (for NEVER and ALWAYS), there are the 3 usual status flags (Carry, Sign and Zero) and additional inputs, that can come from outside signals (à la /EF1-/EF4 of the CDP1802). The 4 signals makes it easy to communicate with outside hardware, sensors, users...

      Now is the time to work on the assembler and disassembler panels.

      20171113 : the above diagram has a major flaw with the Imm8 MUX2 :(

View all 11 project logs

Enjoy this project?



Bartosz wrote 11/08/2017 at 16:40 point

this will working on epiphany or oHm or other cheap machine?

  Are you sure? yes | no

Yann Guidon / YGDES wrote 11/08/2017 at 18:07 point

I'm preparing a version that would hopefully use less than half of a A3P060 FPGA, which is already the smallest of that family that can reasonably implement a microcontroller.

But it's a lot less fun than making one with hundreds of SPDT relays !

  Are you sure? yes | no

Bartosz wrote 4 days ago point

Question is price and posibility to buy

  Are you sure? yes | no

Yann Guidon / YGDES wrote 4 days ago point

@Bartosz : what do you want to buy ?

If you can simulate and/or synthesise VHDL, the source code is being developed and available for free, though I can't support all FPGA vendors.

If you want a ready-made FPGA board, that could be made too.

If you want relays, it's a bit more tricky ;-)

I have just enough RES15 to make my project and it might take a long while to succeed. There will be many PCB and other stuff.

However if, in the end, I see strong interest from potential buyers, I might make a cost-reduced version with easily-found minirelays. I don't remember well but the Chinese models I found cost around 1/2$ a piece. Factor in PCB and other costs and you get a very rough price estimate... It's not cheap, it's not power efficient, it's slow and won't compute useful stuff... But it certainly can make a crazy nice interactive display, when coupled with flip dots :-D

So the answer is : "it depends" :-D

  Are you sure? yes | no

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates