A byte-wide stripped-down version of the YGREC16 architecture

Similar projects worth following
YGREC can stand for many things, such as "YG's Relay Electric Computer", "Yann's Germanium and Relay Equipped Computers" or "YG's Ridiculous Electronic Contraption". You decide !

#YGREC16 is getting pretty large and moving away from the original #AMBAP inspiration, making it less likely to be implemented within my lifetime. So here is a "back to minimalism" version with
* 256 bytes of Data RAM (plus parity ?)
* 8 registers, 8 bits each (including PC)
* fewer relays/gates than the YGREC16
This core is so simple that I focus now on other issues, such as the debug/test access port, the register set's structure, I/O, power reduction...
Like the others, it's suitable for implementation with relays, transistors, SSI TTL, FPGA, ASIC, you name it (as long it uses boolean logic)!

After the explorations with #YGREC-РЭС15-bis, I reached several limits and I decided to scale it down as much as possible. And this one will be implemented both with relays and VHDL, since the YGREC8 is a great replacement for Microchip's PICs.

A significant reduction of the register set's size is required so I/O must be managed differently, through specific instructions. The register map is now:

  • D1  <= for NOP
  • A1
  • D2
  • A2
  • R1
  • R2
  • R3
  • PC  <= for INV

The instruction word is shrunk down to 16 bits. It is still reminiscent of the YGREC16 older brother but I had to make clear cuts... The YGREC8 is a 1R1W machine (like x86) instead of the RISCy YGREC16, to remove one field. Speed should be decent, with a pretty short critical datapath, and all the instructions execute in one clock cycle (except the LDCx instructions and computed writes to PC).

The fields have evolved with time (I have tried various locations and sizes). For example:

20171116: The latest evolution of the instruction format has added a 9-bits immediate field address for the I/O instructions.
20180112: Imm9 is now removed again...
20181024: changed the names of some fields
20181101: modified the conditions to change Imm3 into Imm4
20180112: Imm9 back again ! (for speed/latency reasons, no register operand is provided, an indirect IO register is used instead, and having more IO space is more desirable, otherwise only imm4 is available if a register operand is used)

There are 18 useful opcodes (plus INV, and the pseudo-opcodes HLT and NOP), and most share two instruction forms : either an IMM8 field, or a source & condition field. The source field can be a register or a short immediate field (4 bits only but essential for conditional short jumps or increments/decrements).

The main opcode field has 4 bits and the following values:

Logic group :

  • OR
  • XOR
  • AND
  • ANDN

Arithmetic group:

  • CMPU
  • CMPS
  • SUB
  • ADD

Beware : There is no point to ADD 0, so ADD with short immediate (Imm4) will skip the value 0 and the range is now from -8 to -1 and +1 to +8. (see 17. Basic assembly programming idioms)

Shift group (optional)

  • SH/SA direction is sign of shift, I/R(bit9) is Logic/Arithmetic flag.
  • RO/RC direction is sign of shift, I/R(bit 9) allows carry to be rotated.

Control group:

The COND field has 3 bits (for Imm4) or 4 bits, more than YGREC16, so we can add more direct binary input signals. CALL is moved to the opcodes so one more code is available. All conditions can be negated so we have :

  • Always
  • Z (Zero, all bits cleared)
  • C (Carry)
  • S (Sign, MSB)
  • B0, B1, B2, B3 (for register-register form, we can select 4 bits to test from user-defined sources)

Instruction code 0000h should map to NOP, and the NEVER condition, hence ALWAYS is coded as 1.

Instruction code FFFFh should map to INV, which traps or reboots the CPU (through the overlay mechanism): condition is implicitly ALWAYS because it's a IMM8 format.

Overall, it's still orthogonal and very simple to decode, despite the added complexity of dealing with 1R1W code.

This project is more than an ISA or one implementation : the goal is to become a platform. See log 82. Project organisation

1. Honey, I forgot the MOV
2. Small progress
3. Breakpoints !
4. The YGREC debug system
5. YGREC in VHDL, ALU redesign
6. ALU in VHDL, day 2
7. Programming the YGREC8
8. And a shifter, and a register set...
9. I/O registers
10. Timer(s)
11. Structure update
12. Instruction cycle counter
13. First synthesis
14. Coloration syntaxique pour Nano
15. Assembly language and syntax
16. Inspect and control the core
17. Basic assembly programming idioms
18. Constant tables in program space
19. Trap/Interrupt vector table
20. Automated upload of overlays into program memory
21. Making room for another instruction
22. Opcode map
23. Sequencing...

Read more »

x-compressed-tar - 372.71 kB - 11/20/2021 at 14:46


x-compressed-tar - 372.09 kB - 11/20/2021 at 08:56



assembler refactored, supports DW and re-assembly

x-compressed-tar - 360.31 kB - 11/18/2021 at 17:29



ALU8 still bork and assembler is incomplete

x-compressed-tar - 359.56 kB - 11/14/2021 at 08:08



a better assembler starts to work.

x-compressed-tar - 358.68 kB - 11/12/2021 at 06:18


View all 58 files

  • My first prefix opcode...

    Yann Guidon / YGDES12 hours ago 0 comments

    The previous log 139. Carry on made the case that a prefix would be necessary to solve a pretty difficult problem caused by the lack of opcode space.

    Such a prefix would eat some of the INV opcode space that is reserved so far. It's only a tiny dent though, and there is some room. So why not have a more potent prefix opcode ?

    The prefix could select the source of the carry bit, for example, so we can reuse the whole condition logic and corresponding fields (that's 4 bits though I'm not sure all will be useful). This value would be latched only for the next instruction. That will make exceptions very fun to handle.

    The prefix can also change the destination for the next instruction, to make YGREC8 a semi-2R1W architecture when needed (this could relieve some pressure during coding). So the SND field will be latched for later.

    I have no idea what the I/R, I/R2 and SRI fields could be used for, so far.

    •  I/R should not be touched, to keep the INV opcode. But then we lose an IMM8 at least.
    • SRI does not seem to have valuable information so far. Not sure it could hold MSB of IMM4 value.
    • I/R2 could select a different prefix. Some reserved opcodes will be required to return from exceptions for example.

    The first prefix uses 2 existing fields and leaves 3 bits in the middle.

  • A semi-decent output port

    Yann Guidon / YGDES06/10/2023 at 01:43 0 comments

    When Y8 is integrated in a chip as a building block of a SoC or a microcontroller, the IO space implements some pin-altering registers, usually named GPIO. So you have configuration registers, read register and output register... This last one is often the tricky one.

    In the simplest cases, only a direct output from a DFF or Latch is implemented. This increases the code size and execution time because when you want to modify specific bits, you first have to read the previous port state (or read it from a cached copy somewhere), then mask the unwanted bits and/or OR the others then finally write the result. And code space is often a premium, particularly with only 256 addressable instructions !

    Some more modern chips provide alternate addresses for the output registers, providing additional features such as SET, CLEAR and TOGGLE functions. I start from SET and CLEAR because that's what I was discussing lately. They are indeed implemented straight-forwardly by a Set/Reset flipflop using only 2 NAND2 gates, or 8 transistors in CMOS (2 in RTL/DTL/CTL).

    So I take a basic S0R0 flip flop and add two NAND gates to selectively enable the clear and the set. This way, you just write a 1 to the bits you want to clear or set on the given port. Try by yourself, it's easy :-)

    Total : 4 NAND2 gates (16 transistors per pin), and they can even be paired to use smaller footprints with standard cells.

    Note that the DATA signal is latched from the Y8's register set's read port, it is pretty stable for a while (until a new instruction is fetched). The _SEL signals get decoded and take a bit longer to come alive, and they are only short strobes after the data has settled (otherwise it's a hell, you have to distribute the main clock signal everywhere...).

    The more normal copy though is a bit more complex but having the 3 functions "copy", "set" and "clear" amount to implementing a transparent latch with clear and set. So I'm mostly reinventing the wheel... with the small detail that the clear and set must be controlled by the input data (which acts as an "enable" pin) so it does not conform to a classic standard cell. Thus, let's dive in and have a look at the circuit made with CircuitJS:

    The "copy" function is quite easy for the "set" half : it is congruent with the "set" function indeed. So it's managed at the decoder level.

    But the system must copy the Zero value and that's the tricky part. It requires two more AND and an inverter.

    The structure with the 2 NAND2 converging to the AND is reminiscent of a XOR structure, except there are 3 inputs instead of 2. But the 3 gates could be merged into a single standard macrocell.

    Note also that the CPY_SEL signal can be bubble-pushed so the inverter on the data disappears. But this creates a OR which needs its own inverter anyway...

    There is this alternative version with some bubble-pushing, still using a non-inverting gate (AND) which has its own inverter. But the total amound of inverters has been cut in half. Sigh.

    For each bit of the port, there are 2 pairs of gates that can be merged into a single more complex standard cell each, and 2 gates that remain lonely... That's a total of 26 CMOS transistors, and a bit more for a general reset. To implement the general reset one needs a R1S1 cell instead and rebuild everything but the general reset signal will do its work cleanly.

    So here is the final result : 1 inverter, 3 NAND2, 2 NOR2 and 1 NOR3 per controlled bit, and you can play with it there.

    Et voilà.

    There is a function that is one order of magnitude larger to implement : the "toggle" function requires storing the last value in the port, which needs a full-blown DFF. It's sad and it's something I wish the Raspberry Pi implemented to make #SPI4C  practical to code.

    Anyway the current features are already nice and compact, decoding is rather simple, the timing should be good as long as DATA remains stable before and after the x_SEL pulse. I could add a...

    Read more »

  • Carry on

    Yann Guidon / YGDES03/29/2023 at 04:23 0 comments

    I was wrong...

    I thought that having a carry flag would be enough to solve multi-precision arithmetic codes. It is not.

    The conditional execution of instructions does not solve it either.

    It is really necessary to have a ADC opcode that takes the carry bit as an input (and SBB as well).

    And this is very worrying because the opcode map was frozen and now I need 2 more opcodes while all the opcode space is taken


    It's not that multi-precision addition is expected to be very common but when it occurs, it gets ugly : it takes multiple instructions and registers, it loses orthogonality. I have covered a possible trick at but I don't like it.

    ADC is easy to implement : it's just a AND gate between the output of the carry flag and the carry input of the adder. SBB is almost the same.

    The real problem is that I thought it was not necessary. The opcode map has been frozen and now I need 2 more opcodes. I could dump CMPU and CMPS but their "no writeback" behaviour will increase register pressure or bloat code in common code sequences.

    Any more complex reorganisation will break a lot of code as well as the electronic devices I have already built. It's not impossible but probably not worth it, as the opcode map has been thoroughly polished.

    The last solution I see is ugly in the principle, but convenient : create a "prefix opcode" that enables the carry input.

    This also requires an update to the FSM


  • A little explanation

    Yann Guidon / YGDES04/12/2022 at 20:09 0 comments

    It seems that the "main schematic" below, used as the project's avatar image, is not obvious enough, at least before zooming in enough.

    @Ken KD5ZXG was not sure how to interpret/decypher the upper-left side.

      I see YGCREC8 mux between Carry Sign Zero Always and the lower four bits of B. What is B? I get what Write Enable and N are for, but how do these interact with the tested condition to drive some rectangle called 3=>8 that I might guess 8 way mux if you hadn't already drawn other mux with the proper symbol. Maybe the 3=>8 rectangle is just a decoder? I've got a condition mixed with N thats probably one of the 3 selects, a write enable thats not clear how it works (chip enable?), SND whose purpose remains a mystery drives some other select bits? Enlighten me, cause I'm clearly not getting it.

    So here is the explanation.

    • The 4 basic conditions are ACSZ, as already obvious. I now use the alphabetical order for convenience and mnemotechnic help.
    • The 4 extended conditions are optional and user-configurable: either fixed (if you have 4 input/GPIO pins or other internal signals) OR you can implement a set of Special Registers that select the source of each condition bit. This is one of the tricks inherited from the YASEP.

    The home page says :

      The COND field has 3 bits (for Imm4) or 4 bits, more than YGREC16, so we can add more direct binary input signals. All conditions can be negated so we have :

      • Always
      • Z (Zero, all bits cleared)
      • C (Carry)
      • S (Sign, MSB)
      • B0, B1, B2, B3 (for register-register form only, we can select 4 bits to test from user-defined sources)

    So this extra condition bit allows extensions for later, which could speed up some IO intensive algos, such as bit banging.  "B" means "bit", it's not a register per se (though it must be latched before to prevent race conditions) and it is user-defined wires. They could be front panel switches, synchronous or asynchronous data over 1 or 2 bits... or some condition inside the extensions blocks like UART ready/overflow/whatever status bits. I was a bit inspired by the CDP1802 on this, I admit.

    The "Never" condition could be mapped to another bit/wire/condition but I don't want to play this game yet. ARM mapped this to an extension to the instruction set but YGREC8 is too young for that gymnastics yet.

    ... Maybe the 3=>8 rectangle is just a decoder?

    Yes, once the condition for writing the destination register is determined, it is sent to the appropriate destination register for writeback. I should have made it clearer but the drawing is already pretty crowded :-)

    The register set uses latches, and not DFF, to cut the register set power/area/cost in half. Imagine routing the clock signal to 64 bits and only updating 8 (or 16) every time... The diagram misses a buffer latch on the result datapath btw.

    I'll try to summarize :

    • First the condition must be evaluated. It is by default 1 : enable. The XOR can negate the condition when needed, directed by the relevant bit in the instruction word (in red). The condition can be any of A/C/S/Z/b0/b1/b2/b3 (the B conditions are optional as noted above). So it is quite simple : we have a bit that is usually 1, but also selectively cleared.
    • The condition is valid at the output of the XOR. Then this signal will
      • enable the update of the C/S/Z condition flags
      • go to the 3=>8 decoder to enable and select one byte of latches, so the relevant register (indicated by the SND field) is written (or not).

    I hope it helps.

  • counters

    Yann Guidon / YGDES11/22/2021 at 09:35 0 comments

    In the log 129. Counters strike ! I started considering the new version of the "counters". It's easier said than made and the many clock domains don't ease the design. But I managed to draft one "bit":

    Each byte can be read with one byte select per counter (but the latency is quite high as the data ripple through many gates). Writing is another story because it's asynchronous so a byte clear precedes a selective set. Each byte has their own clock domain, selected among various sources, including the preceding counters. And the counter's value comes either from the local incrementer or the previous counter's value, for the cases of arbitrary frequency dividers (think: baud generator as a trivial example, the previous incrementer is left unused in this case).

    This is quite scalable, the size of the pool of counters can be configured at will. A 8×8 bits block seems like a good compromise but nothing keeps one from changing that.

    . . . . . . . . . . . .

    Looking further, 2 main concerns arise:

    1. Latency : I don't want the I/O space to be too constrained or constraining. There might be a cacophony of wires, registers and decoders that would probably slow everything down. So let's consider the I/O space as "mostly asynchronous" from the core's point of view. This means that IN and OUT should have a "completion" flag that lets the core resume operation once the IO is done. Which means I must adapt/change/update the instruction's FSM...
    2. The register map. So far I have identified that each counter byte could have 2 addresses: one for read and write of the value, the other for control and status. Thus there is no constraint on the total number of counters: implement as many as you like. However, timing and scheduling becomes critical at this point so look at the previous point.

  • Tri-mode TAP

    Yann Guidon / YGDES11/21/2021 at 07:36 0 comments

    Now that I have an assembler, how do I upload the program into the core ?

    The TAP (Test Access Port) allows one to upload data, instruction by instruction, then make the core run them. However this is quite complex and not suitable for autonomous operation, like, when the core works alone.

    One thing I would loooove is to hook the circuit to a serial port of my computer and then cat a binary file through /dev/ttyS0 to the circuit. When 512 bytes are transferred, the processor starts the uploaded program. It's convenient from the user's point of view, though it doesn't allow full debugging and requires baud generation circuits. Some sort of external adaptation circuit on a dongle must be designed.

    Another interesting situation would be that the TAP circuit itself goes to fetch the program by itself. Usually it comes from a SPI Flash device, and I have already developed such a system in 2014/2015 for #WizYasep.

    I am familiar with SPI Flash devices: #SPI Flasher implements a few protocols already. SPI usually works with 4 signals :

    • MISO
    • MOSI
    • CLK
    • SEL

    Compare this to the TAP interface:

    • Din
    • Dout
    • CLK
    • R/W
    • /Reset (optional)

    It looks quite similar, right? Furthermore, the TAP contains the shift register and the other circuits that write to the program memory, which is indexed by the PC register (the latter conveniently wraps around to 0 when the upload is over, and signals the FSM to start running said program).

    So there are already about one half of the circuits in place for autonomous loading, now the trick is how to hack the existing system to add the new features.

    First, the Y8 interface needs to know the operating mode. So far, the TAP had only one mode so the question didn't arise but we have identified 3 modes, that would be conveniently selected by weak pull-up/down resistors on the pins:

    1. TAP mode : R/W low, CLK low (?)
    2. External/slave programming mode : R/W low, CLK high (?)
    3. Autonomous SPI programming mode : R/W high

    It starts to look like what modern FPGA provide...

    Usually, the R/W pin is pulled up by an internal weak resistor, which is overridden by an external upload/control device. The default behaviour is to get the stream of 4096 bitts from external SPI storage. I'm not sure yet about the CRC/scrambling, which can be designed/added later, but should not remain an afterthought for ever.

    The state of the  R/W pin is sampled by the FSM during the Reset sequence, just after the /Reset pin is brought high. Note that the TAP is still functional even when the rest of the chip is held in RESET state, since TAP as its own reset sequence and clock domain.

    Note: the TAP used to have 3 pins only (plus external RESET though it could also be controlled from within the TAP registers). See for the diagrams. Now, in addition to the previously defined interface, the external debug/upload device must control the /Reset pin to take over the internal FSM and prevent conflicts with the internal operation. The minimal number of pins for the probe is 6 but more become desired:

    • Din
    • Dout
    • CLK
    • R/W
    • SSel (open collector)
    • /Reset (open collector)
    • Vtarget
    • GND

    It's on the "high tier" of the pin count range for probes but that's the price for sharing a SPI bus between 2 masters. It's also for safety due to the uncertainty of the support of "half-duplex" mode by SPI Flash chips. I added these 2 pins:

    • The debug probe would certainly want to access the SPI Flash for programming, and the SPI Flash should not interfere with the normal TAP operation. A separate pulled-up SPI Sel pin (SSel) becomes necessary so the SPI Flash is deselected during TAP operation.
    • I also added Vtarget because the TAP probe shouldn't operate if the target circuit is not energised. Some voltage translation buffers will ensure electrical integrity, prevent "ghost powering" through pins' protection diodes, and the probe must be sure that the target is correctly powered.

    I know it's getting more complex but not everything...

    Read more »

  • The YGREC8 Assembler

    Yann Guidon / YGDES11/20/2021 at 16:20 0 comments

    This is a copy of the current documentation for y8asm :


    The YGREC8 Assembler

    Y8asm is a 2-pass assembler written in VHDL. It transforms source/assembly language .y8 files into .hyx files suitable for flashing/emulation. The VHDL code is compiled and elaborated with GHDL by the provided build/test script, and generates an executable program.


    The program can't include files or manage macros. Use external programs such as cpp or m4. You could also concatenate source files with cat to a temporary file.

    Program invocation

    The program runs on the command line interface or in a script. There are 3 active parameters:

    • Input file name: -gname

    $ ./y8asm -gname=example.y8

    will assemble the file named example.y8.

    The output file name is derived from this parameter, with the .hyx suffix.

    The output file will be overwritten without a warning.

    • Symbol table output: -gdump

    $ ./y8asm -gname=example.y8 -gdump=yes

    appends a dump of the user-defined symbols in the comments at the end of the .hyx output file.

    The dump also contains the number of times the symbol has been referenced (though it might be over-estimated because it includes both passes). That's still convenient if you want to clean up your source code and prune some useless lines.

    The option -gdump=full dumps all the defined symbols, including the reserved words, keywords, opcodes etc.

    • Maximum Symbol Length : -gmax_sym_len

    $ ./y8asm -gname=example.y8 -gmax_sym_len=12

    changes the maximum length of symbols (labels, identifiers etc.) from the defaut 16 characters to 12 characters.

    Basic Syntax

    • Comments start at ';' and remove the rest of the line.

    • All the symbols are translated to upper case during parsing.

    • The symbols can not be redefined. Unless they are in a nested context (not yet implemented).

    • User's symbols have a range dependent on the VHDL simulator, "at least 32 bits". This is practical for intermediary values since the assembler checks each range for every instruction field.

    • Identifiers can contain the following letters: '_', 'A' to 'Z' and '0' to '9' (but no digit at the first position).

    • Separators are space ' ', comma ',', horizontal tab, and ASCII character 160 (  in HTML).

    • The dollar sign '$' represents/returns the value of the current address.

    • Numbers are decimal by default. Binary numbers have the b suffix and h is the hexadecimal suffix.

    • Numerical computations ("expressions") are always between parenthesis, to avoid precedence. The assembler supports the following arithmetic operations: '+', '-', '*', '/', '%'. VHDL does not allow easy boolean operations on integers, unless you go the std_logic_vector route, but I can't change the standard... I'll add later if needed.

    • By default, code is assembled starting at address 0. Don't forget to ORG if you need otherwise.


    The assembler provides some housekeeping commands that greatly help even basic programs.

    • END

    Ends parsing of the file. The source file can contain any garbage below this line.

    • DEF

    DEFines a new user symbol.

    The line

    DEF plop 42

    defines the symbol "PLOP" and assigns the value 42. The symbol "PLOP" can be used later in the source file, and even before for instructions and DW.

    See the -gdump=yes command line argument to list all the user-defined symbols.

    • Label:

    The line


    is a shorthand to the line "DEF plop $".

    There is no separator before ':' and the label must be alone on the line.

    • ORG

    Change the address to which the next instruction will be assembled/stored.

    ORG 42

    means that the next instruction will be stored at address 42.

    The value may be a number, symbol or expression, but can not be post-defined or have a value lower than the current address.

    • DW

    DW 42

    will output the value 42 to the instruction memory space, as if it was an instruction.

    Just like an instruction, it is 16-bit wide and can have a post-defined value.


    The assembler pre-defines the following...

    Read more »

  • jumping back and forth, and carry

    Yann Guidon / YGDES11/20/2021 at 15:34 0 comments

    The Y8 can jump:

    set label pc

    it can also jump relative :

    add (label-$) pc

    and it can even jump relative conditionally :

    add (label-$) pc ifc

    But then the range is limited because only 4 bits are available for the the signed amplitude. And I have already sacrificed one condition bit...

    With the new assembler, here is the best that can be reasonably done:

    add (forward-$) PC ifnz
    nop ; 1
    nop ; 2
    nop ; 3
    nop ; 4
    nop ; 5
    nop ; 6
    nop ; 7
    nop ; -8
    nop ; -7
    nop ; -6
    nop ; -5
    nop ; -4
    nop ; -3
    nop ; -2
    nop ; -1
    add (backwards-$) PC ifz

    The output in .hyx:

    ; L1: add (forward-$) PC ifnz
    75BF ; @0: ADD   8 PC IFNZ   
    ; L2: nop ; 1
    0000 ; @1: NOP               
    ; L3: nop ; 2
    0000 ; @2: NOP               
    ; L4: nop ; 3
    0000 ; @3: NOP               
    ; L5: nop ; 4
    0000 ; @4: NOP               
    ; L6: nop ; 5
    0000 ; @5: NOP               
    ; L7: nop ; 6
    0000 ; @6: NOP               
    ; L8: nop ; 7
    0000 ; @7: NOP               
    ; L9: forward:
    ; = 8
    ; L11: backwards:
    ; = 8
    ; L12: nop ; -8
    0000 ; @8: NOP               
    ; L13: nop ; -7
    0000 ; @9: NOP               
    ; L14: nop ; -6
    0000 ; @10: NOP               
    ; L15: nop ; -5
    0000 ; @11: NOP               
    ; L16: nop ; -4
    0000 ; @12: NOP               
    ; L17: nop ; -3
    0000 ; @13: NOP               
    ; L18: nop ; -2
    0000 ; @14: NOP               
    ; L19: nop ; -1
    0000 ; @15: NOP               
    ; L20: add (backwards-$) PC ifz
    77C7 ; @16: ADD  -8 PC IFZ    
    ;;;; SYMBOL DUMP :
    ; * 'FORWARD'=8 ref:1 / sym_usr
    ; * 'BACKWARDS'=8 ref:1 / sym_usr

    A backwards loop could then contain 8 instructions (including a test for the end of the loop) but the forward jump can only skip over 7 instructions, despite the ability to encode the constant 8 when dealing with the PC register.

    The offset 1 is still possible and this represents the next instruction, which would be trivial to execute otherwise. And the offset 8 points to the 8th instruction after the skipped block, it's not the size of the skipped block.

    At least it's now impossible to do a pointless loop such as

    ADD 0 PC ifnz ; spin endlessly doing nothing

    To achieve a more practical goal, the operand should be the NPC, or PC+1, which is being computed at the same time as the addition. But this creates a whole lot of troubles, in particular:

    • if we compute PC+2 then the backwards jump will only reach 7 instructions
    • Timing becomes too tight, since the pipeline must choose between PC and PC+1 depending on the imm's sign
    • This will require a stall cycle, and there is already one because writes to PC must discard the prefetched instruction.

    At this point, the "short add trick" requires only a few logic gates (to detect the opcode, the format and the sign of imm4, detecting the PC register is not even necessary) and no deep modification of the state machine.

    Trying to squeeze one more instruction, to skip 8 opcodes, would complicate the whole circuit with quite little benefits...

  • Status 20211114

    Yann Guidon / YGDES11/14/2021 at 08:17 0 comments

    The new source code archive is there ! YGREC8_VHDL.20211114.tgz

    Some pretty things are there though it's still missing quite a lot:

    • The assembler is not complete: it misses some keywords, thorough tests, padding, proper symbol definitions and backward patches...
      update 20211118: VHDL assembler is mostly functional. more tests, doc & examples are welcome, as well as nested expressions.
    • The ALU8 is more-or-less working, the tests run but find an error.
    • TAP is incomplete and more debug infrastructure is required.
    • Some files have been split and/or moved around
    • the gates library #Libre Gates project is still somehow evolving in parallel, in a soft fork that I'll have to reconcile.

    The core is still not complete... there are still many things to manage under the hood. But what works works well :-)

    And having a proper assembler helps a lot too. No more estimates, it's now possible to test and reproduce ideas! So I think it's the priority for the next days, then I'll go back to #Libre Gates  so I can fix the ALU and progress on the TAP, which is critical to enter programs, control the core and read back its status...

    The Shift unit and the register set will then be quite easy to design, I think.

  • Undecided overlay options

    Yann Guidon / YGDES11/12/2021 at 01:55 0 comments

    Work is progressing nicely on the new assembler. This also allows me to find some corner cases that I didn't consider carefully yet. Let's look at the existing disassembler:

        if OPC=Op_CALL and SND=Reg_PC then
          if Imm9="111111111" then
            result(1 to 3):="HLT";
        result(1 to 7):="OVL " & SLV_to_Hex(Imm8) & "h";
    -- /!\ bit 11 not used ?
          end if;
        end if;

    The HLT (halt) opcode uses all the 9 bits but the OVL (overlay) only uses 8, since only a byte can be managed by the overly register.

    The 11th bit is not handled so there are 3 ideas that come to mind :

    1. extend the immediate field to work with IMM9 like IN/OUT (simplest)
    2. consider the R/I bit so the OVL instruction can use a register argument as well (useful to deal with multiple or indirect overlay numbers)
    3. create a new instruction that provides another functionality (which ?)

     The jury is still out.

View all 141 project logs

Enjoy this project?



Yann Guidon / YGDES wrote 04/25/2021 at 21:57 point

This project is not dead, I'm just extra over-busy with more immediate concerns and priorities...

  Are you sure? yes | no

salec wrote 10/09/2019 at 09:18 point

YGREC can stand for so many things, but since my wife has been learning French on Duolingo I can't avoid noticing that it is also a wordplay on French spelling of "Y". 


  Are you sure? yes | no

Yann Guidon / YGDES wrote 10/09/2019 at 10:03 point

oh, of course, yes, too ;-)

  Are you sure? yes | no

salec wrote 10/09/2019 at 12:04 point

always have an opening joke/tease for audience :D

  Are you sure? yes | no

Yann Guidon / YGDES wrote 10/09/2019 at 12:46 point

@salec  always !

  Are you sure? yes | no


[this comment has been deleted]

Yann Guidon / YGDES wrote 04/14/2019 at 08:56 point

That "purposeful sense" may look drowned into the proliferation of projects, angles and ideas but it is still clear to me since it's my main hobby since 1998 at least :-D

I'm glad you enjoy !

  Are you sure? yes | no

Yann Guidon / YGDES wrote 11/04/2018 at 07:11 point

Another note for later :
writing to A1 or A2 starts a fetch from RAM. In theory the latency is the same as instruction memory and one wait state would be introduced. However the processor can also write directly so the wait state would be only on read to the paired data register...

  Are you sure? yes | no

Yann Guidon / YGDES wrote 11/04/2018 at 06:55 point

Note for later : don't forget the transparent latch on the destination register address field, for the (rare) case of LDCx, because the 2nd cycle doesn't preserve the opcode etc.

  Are you sure? yes | no

Yann Guidon / YGDES wrote 11/04/2018 at 07:18 point

OK, not a transparent latch, but a DFF and a mux, plus some logic to control it.

-- DFF, every cycle :

SND_latched <= SND_field;

LDCx_flag <= '1' when (LDCx_flag='0' and opcode=opc_LDC and writeBack_enabled='1')   else '0';

-- MUX2 :

WriteAddress <= SND_latched when LDCx_flag = '1' else SND_field;


Note : LDCx into PC must work without wait state because it's connected directly to SRI, as an IMM8, and no extra delay is required. PC wait state is required for ADD/ROP2/SHL and IN.

  Are you sure? yes | no

Frank Buss wrote 10/27/2018 at 12:51 point

Do you really plan 8 byte-wide registers? This would require thousands of relays :-)

  Are you sure? yes | no

Yann Guidon / YGDES wrote 10/27/2018 at 14:26 point

no :-)

8 registers, 8 bits each = 64 storage bits.
1 relay per bit => 64 registers

The trick is to use the hysteretic mode of the relays :-)

  Are you sure? yes | no

Frank Buss wrote 10/27/2018 at 16:17 point

Ok, makes sense. Maybe change the project description, someone might think you are planning a 64 bit architecture.
BTW, could this be parametrized for the address and data size? If you implement it in VHDL, you could use generics for this, would be no additional work to use just the generic names instead of hard coded numbers. Except maybe some work for extending the instruction opcodes.

  Are you sure? yes | no

Yann Guidon / YGDES wrote 10/27/2018 at 17:16 point

Frank : DAMNIT you're right !

I updated the description...

  Are you sure? yes | no

Yann Guidon / YGDES wrote 10/27/2018 at 17:19 point

For the parameterization : it doesn't make sense at this scale. Every fraction of bit counts and must be wisely allocated.

Larger architectures such at #YASEP Yet Another Small Embedded Processor  and #F-CPU  have much more headroom for this.

  Are you sure? yes | no

Bartosz wrote 11/08/2017 at 16:40 point

this will working on epiphany or oHm or other cheap machine?

  Are you sure? yes | no

Yann Guidon / YGDES wrote 11/08/2017 at 18:07 point

I'm preparing a version that would hopefully use less than half of a A3P060 FPGA, which is already the smallest of that family that can reasonably implement a microcontroller.

But it's a lot less fun than making one with hundreds of SPDT relays !

  Are you sure? yes | no

Bartosz wrote 11/14/2017 at 14:13 point

Question is price and posibility to buy

  Are you sure? yes | no

Yann Guidon / YGDES wrote 11/14/2017 at 16:08 point

@Bartosz : what do you want to buy ?

If you can simulate and/or synthesise VHDL, the source code is being developed and available for free, though I can't support all FPGA vendors.

If you want a ready-made FPGA board, that could be made too.

If you want relays, it's a bit more tricky ;-)

I have just enough RES15 to make my project and it might take a long while to succeed. There will be many PCB and other stuff.

However if, in the end, I see strong interest from potential buyers, I might make a cost-reduced version with easily-found minirelays. I don't remember well but the Chinese models I found cost around 1/2$ a piece. Factor in PCB and other costs and you get a very rough price estimate... It's not cheap, it's not power efficient, it's slow and won't compute useful stuff... But it certainly can make a crazy nice interactive display, when coupled with flip dots :-D

So the answer is : "it depends" :-D

  Are you sure? yes | no

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates