Close
0%
0%

YGREC8

A byte-wide stripped-down version of the YGREC16 architecture

Similar projects worth following
YGREC can stand for many things, such as "YG's Relay Electric Computer", "Yann's Germanium and Relay Equipped Computers" or "YG's Ridiculous Electronic Contraption". You decide !

#YGREC16 is getting pretty large and moving away from the original #AMBAP inspiration, making it less likely to be implemented within my lifetime. So here is a "back to minimalism" version with
* 256 bytes of Data RAM (plus parity ?)
* 8 registers, 8 bits each (including PC)
* fewer relays/gates than the YGREC16
This core is so simple that I focus now on other issues, such as the debug/test access port, the register set's structure, I/O, power reduction...
Like the others, it's suitable for implementation with relays, transistors, SSI TTL, FPGA, ASIC, you name it (as long it uses boolean logic)!

After the explorations with #YGREC-РЭС15-bis, I reached several limits and I decided to scale it down as much as possible. And this one will be implemented both with relays and VHDL, since the YGREC8 is a great replacement for Microchip's PICs.

A significant reduction of the register set's size is required so I/O must be managed differently, through specific instructions. The register map is now:

  • D1  <= for NOP
  • A1
  • D2
  • A2
  • R1
  • R2
  • R3
  • PC  <= for INV

The instruction word is shrunk down to 16 bits. It is still reminiscent of the YGREC16 older brother but I had to make clear cuts... The YGREC8 is a 1R1W machine (like x86) instead of the RISCy YGREC16, to remove one field. Speed should be decent, with a pretty short critical datapath, and all the instructions execute in one clock cycle (except the LDCx instructions and computed writes to PC).

The fields have evolved with time (I have tried various locations and sizes). For example:

20171116: The latest evolution of the instruction format has added a 9-bits immediate field address for the I/O instructions.
20180112: Imm9 is now removed again...
20181024: changed the names of some fields
20181101: modified the conditions to change Imm3 into Imm4
20180112: Imm9 back again !

There are 18 useful opcodes (plus INV, and the pseudo-opcodes HLT and NOP), and most share two instruction forms : either an IMM8 field, or a source & condition field. The source field can be a register or a short immediate field (4 bits only but essential for conditional short jumps or increments/decrements).

The main opcode field has 4 bits and the following values:

Logic group :

  • OR
  • XOR
  • AND
  • ANDN

Arithmetic group:

  • CMPU
  • CMPS
  • SUB
  • ADD

Beware : There is no point to ADD 0, so ADD with short immediate (Imm4) will skip the value 0 and the range is now from -8 to -1 and +1 to +8. (see 17. Basic assembly programming idioms)

Shift group (optional)

  • SH/SA direction is sign of shift, I/R(bit9) is Logic/Arithmetic flag.
  • RO/RC direction is sign of shift, I/R(bit 9) allows carry to be rotated.

Control group:

The COND field has 3 bits (for Imm4) or 4 bits, more than YGREC16, so we can add more direct binary input signals. CALL is moved to the opcodes so one more code is available. All conditions can be negated so we have :

  • Always
  • Z (Zero, all bits cleared)
  • C (Carry)
  • S (Sign, MSB)
  • B0, B1, B2, B3 (for register-register form, we can select 4 bits to test from user-defined sources)

Instruction code 0000h should map to NOP, and the NEVER condition, hence ALWAYS is coded as 1.

Instruction code FFFFh should map to INV, which traps or reboots the CPU (through the overlay mechanism): condition is implicitly ALWAYS because it's a IMM8 format.

Overall, it's still orthogonal and very simple to decode, despite the added complexity of dealing with 1R1W code.


Logs:
1. Honey, I forgot the MOV
2. Small progress
3. Breakpoints !
4. The YGREC debug system
5. YGREC in VHDL, ALU redesign
6. ALU in VHDL, day 2
7. Programming the YGREC8
8. And a shifter, and a register set...
9. I/O registers
10. Timer(s)
11. Structure update
12. Instruction cycle counter
13. First synthesis
14. Coloration syntaxique pour Nano
15. Assembly language and syntax
16. Inspect and control the core
17. Basic assembly programming idioms
18. Constant tables in program space
19. Trap/Interrupt vector table
20. Automated upload of overlays into program memory
21. Making room for another instruction
22. Opcode map
23. Sequencing the core
24. Synchronous Serial Debugging
25. MUX trees
26. Flags, PC and IO ports
27. Binary translation (updated)
28. Even better register set
29. A better relay-based MUX64
30. Register set again
31. Rename that opcode !
32. Register set again again
33. Yet Another Fork
34. What can it run ?
35. More register set layout
36. More VHDL and more...

Read more »

CLA8_NAND.cjs

CircuitJS source code for the NANDified Carry Lookahead

cjs - 7.64 kB - 03/17/2020 at 00:05

Download

Add8.cjs

source code of the CLA8 for falstad.com/circuit

cjs - 8.16 kB - 03/05/2020 at 23:24

Download

YGREC8_VHDL.20200119.tgz

Added FSM ALU_2020 is bork, needs debug

x-compressed-tar - 143.02 kB - 01/19/2020 at 04:14

Download

x-compressed-tar - 138.15 kB - 01/05/2020 at 05:34

Download

YGREC8_VHDL.20191231.tgz

Last release of 2019, new start for 2020 with the new, ASIC-friendly, ROP2 unit

x-compressed-tar - 119.74 kB - 12/31/2019 at 02:05

Download

View all 38 files

  • Re-NANDifying the CLA

    Yann Guidon / YGDES03/14/2020 at 04:50 0 comments

    My efforts with Pushing more bubbles, now the carry-lookahead adder were promising but a bug somewhere has made it vain. So I restarted from scratch instead of trying to dig too deep in my own code. That's how I came with the results of Bitslice and now, I have a big advantage : I can have arbitrary polarity of the input and output of the CLA logic and I have more freedom to choose the gates.

    Also this time I should make more progressive alterations to the design to catch errors earlier. Like : I should test it bit by bit and build the exhaustive test at the same time :-)

    All the G and P inputs benefit from the inverter (which is in fact the output of the NOR2 or NAND2 of the ROP2 bitslice) and they all have a fan-in of 1 so the NOR2 and NAND2 only need a fanout of 2.

    The other signals have a fan-in of 2, 2, 3, 1, 2, 0, which is reasonable. Cin has a serious fanout but is ready much earlier so it's not critical.

    For P2(0) I replaced the AND3 with a NOR3. This provides the signal earlier than G2(0) because the inverted inputs arrive one inverter earlier (and there is only one driver layer).


    The second block of the CLA is quite similar (which is not surprising since it is more or less copied from the LSB part).

    The same recipe is applied. The AND-OR is replaced by NAND-NAND, AND3 is replaced by NOR3.

    I had to insert two inverters, one on G2(1) because it's used on the inverting input of 2 blocks, so I renamed the signal /G2(1).

    All the inverted input signals are used, and only once, just like before.


    The last bits and the carry output are pretty similar:

    Note how each G and P input is used, only once for each polarity.

    The gates on the left have some freedom for re-interpretation.

    The carry output is XORed but there is no penalty because the other outputs are XORed as well but in the ROP2 and SUXEN level.

    The new source is there : https://cdn.hackaday.io/files/272801167147520/CLA8_NAND.cjs

  • Bitslice

    Yann Guidon / YGDES03/13/2020 at 00:24 0 comments

    The ROP2 and ALU part has been slowly expanding to the SUXEN but the log  70. The nexus  reminds that something is missing : I have forgotten to include PC+1 (NPC) value. So another level of MUX is required, which is fortunate because I had also left the SHL result. I can then use another full MUX3.

    In a previous log ROP2 with Falstad I came up with this diagram:

    and it seems it must be extended a bit with another layer of MUX3 (source

    The CDP of the whole stack is about 10 simple gates and I have not counted the CLA or the IN port.
    It's pretty satisfying to see that whole "datapath" in one picture, at last :-)

    You can see a lonely inverter on the /X signal. This is an optional correction for the output polarity of the CLA. It can be omitted if needed, it's on the "slow path" and provides some degrees of freedom for the CLA design.

    Speaking of slow paths : there is one OR just before the ROP2_out signal but it looks uncompressible and not critical so I leave it here. The input XOR for SND is critical though. I'll see how I can reduce the output XOR from CLA, there is a fun trick to play with BJT ("enable" by playing with the CLA_EN signal tied to the pull-up resistor of the interlocked pair).

    Total gates : 20, 13 NANDx 2 XOR, 3 INV, 1 OR, 1 NOR.

    That's 160 gates for the 8-bits datapath (ignoring the CLA and SH circuits). It looks pretty easy to layout and route but the output and the inputs will be located at the same side to ease routing of the register set. I'll probably move to a 3-tiles high organisation for FPGA & ASIC.


    Update :

    I might have found a trick to save a bit of stuff somewhere :-)

    The idea is to combine NEG and PASS_EN at the XOR input level, which saves one NAND2 and reduces the NAND3 to NAND2 (which can also add one more input in the datapath if needed). There is a need however to get /L somehow/somewhere but a XOR contains 2 inverters anyway. But NEG and PASS_EN can be controlled at the decoder level and the other inverter is moved/shared.

    Oh and I also replaced the OR (for AND_EN) with a NAND2, one input has an inverter while the other input can be inverted at the decode level.  (source)

    PASS_EN is renamed to PASS_SND because it makes more sense.

    I have also added the Zero detection. That would be a OR8 (in a way or another).

    Here we see the signal going from SRI to the Result output, NEG and PASS_SND are disabled so the value flows through the OR logic : OrXor_en, ROP2_en, MX_en are enabled.

    In the decoder's logic, AND_EN, NEG and PASS_SND are affected. This removes several "don't care" situations.

         NEG PASS_SND AND_en XOR_en OrXor_en ROP_en CLA_en IN_en MX_en
    OR    0      1      0      0       1      1       0      0      1
    XOR   0      1      0      1       1      1       0      0      1
    AND   0      1      1      1       0      1       0      0      1
    ANDN  1      0      1      1       0      1       0      0      1
    SUB   1      0      x      1       1      0       1      0      0
    ADD   0      1      x      1       1      0       1      0      0
    PASS  0      0      x      x       1      1       0      0      1
    IN    x      x      x      x       x      x       0      1      0
    clear x      x      x      x       x      x       0      0      0

    .

  • NAND3

    Yann Guidon / YGDES03/09/2020 at 03:15 0 comments

    Reader warning : this log/post touches the fundamental things that make me the most passionate about digital design and architecture. Playing with TTL chips, relays and transistors is a fun game but here you have a glimpse of some damned serious matters. This log justifies several aspects of my design choices so strap you belt and learn a few things.


    I aim at building the YGREC8 with various technologies (mainly for fun and giggles) but with the same ISA, so the different implementations can execute the same programs, as well as with the same structure and even the same gatelist (except for the relays version). This means that I focus on the manual synthesis of the design and I break down every function down into individual gates. I choose the lowest common denominator for the chosen technologies and then I reuse the same gatelist without trying to overoptimise too much for each target...

    This means I must also choose the right structures and keep them (except for the relays). For example, the ALU will be (mostly) identical, with the same CLA because I don't want to re-engineer the system for every new implementation.

    The ProASIC3 and the relay versions favour the MUX2 as the atomic, do-everything gate but I intend to use ASIC/CMOS as well as bipolar discrete transistors, which require simpler gates.

    • Bipolar gates really prefer the NAND function. It's really the simplest, so it should be the fastest...
    • CMOS ASIC technology loves both NAND and NOR (they are symmetrical) but they have a practical limit for the number of inputs. Apparently 3 is a compromise between size and speed because more inputs would put too many pass transistors in series, which would slow down the gate, or force the channel to be too large to compensate (and increase capacitance).

    So the "preferred gates" are NAND2 and NAND3.

    Others like latches, NOR, INV and XOR are accepted where needed. For example I have studied the structure of the latches and XOR in several previous logs on other projects (for example the XOR zoo)

    Having more inputs to the NAND would be a big benefit to reduce the size and increase the overall speed:

    • This lets MUX have more inputs and fewer levels, which is better
    • The carry lookahead (and incrementer) can have a coarser granularity, fewer levels and a shorter critical datapath
    • and I probably forget a few other units, SHL would be a good candidate as well.

    Bipolar discrete circuits can have many inputs, 4 would not be a concern, maybe 8 is possible before running into signal integrity issues. The question is: is it a good choice for CMOS ?

    I have recently received an answer from @Staf Verhaegen :

    "Multi-input cells are mainly power and area optimization and not performance. Area optimization is trivial due to reduced number of transistors; power optimization is due to the removal of internal switching nodes.
    I haven't looked deeply in maximum number of series transistors in a design but typically one does not go above four. Going more would need big transistors and likely not that much would be used by synthesis anyway.
    "

    Thank you for the context expansion :-)


    Let's see how/why this is so.

    CMOS obeys to a few rules, in particular t=RC so the goal is always to minimise resistance and capacitance.

    Capacitance comes from the gates regions because the thin area where the poly overlaps diffusion creates a capacitor. The smaller the gate, the faster.

    However resistance comes from the relative width of the area through which current flows. The smaller the section, the less current flows, so the width must be maximised to make fast circuits.

    So there is this basic compromise : if you make a transistor wider you increase the current hence the speed but this also increases the capacitance, which reduces the speed...

    And this is for one transistor. CMOS gates need transistors in parallel and series ! And the more inputs, the more in series, and the more resistance, which also reduces the speed...

    • 2-input gates are a bare theoretical...
    Read more »

  • ROP2 with Falstad

    Yann Guidon / YGDES03/07/2020 at 01:12 0 comments

    Note: this log is obsoleted by Bitslice


    After the last log Adder with Falstad, I also converted the ROP2 bitslice to the interactive simulator :

    There is still the challenge to disable the output of the CLA so it can be combined by ORs with the rest of the units.

    My guess was to decompose the end XOR and include the "enable" signal at the end of the combining gate but I get the wrong polarity. Instead I have this solution for XA1:

    however the fanout for the "enable" signal is doubled and this feeds 2 gates with 3 inputs, which uses more space...

    It's hard to reduce the XOR gate and each technology has their own tricks up their sleeve to implement it, so I prefer to keep the XOR as is and the output is MUXed with a classic 2-levels NAND circuit:


    Now I have moved the final XOR of the CLA back into the ROP2 circuit, which also saves one gate because I duplicated a (N)OR. The final XOR is driven by the "shared XOR" which has a reverse polarity so you can see I added an INV but it can be bubble-pushed in the CLA or in the final MUX as needed. Source:

    The truth table is updated :

          NEG  PASS_en AND_en XOR_en OrXor_en ROP_en CLA_en IN_en
    /OR   0      x      x      0       1      1       0      0
    \OR   0      x      0      x       1      1       0      0
    XOR   0      0      1      1       1      1       0      0
    AND   0      0      0      1       0      1       0      0
    ANDN  1      0      0      1       0      1       0      0
    SUB   1      x      x      1       1      0       1      0
    ADD   0      x      x      1       1      0       1      0
    PASS  x      1      x      x       0      1       0      0
    IN    x      x      x      x       x      0       0      1
    clear x      0      0      0       0      0       0      0

    Simulation with Falstad helped uncover some non-trivial "don't care" states.

    The PASS code is back to "ROP2 land" but this shouldn't create too many problems in the decoder.

    I added the IN instruction and left the SH codes for a future version, so it fits with the final NAND3 gate.

    Some signals such as AND_en are updated because the ROP2_en "shadows" them.

    It seems XOR_en and AND_en could have their names swapped ? AND_en is 1 only for XOR, and XOR_en is 0 only for OR...

  • Adder with Falstad

    Yann Guidon / YGDES03/05/2020 at 23:15 0 comments

    So Falstad is a reasonably potent logic simulator that lets me input schematics easily and test them.

    I had hit a bug in the ALU8's CLA when porting the #VHDL library for gate-level verification back to #YGREC8 and went to other sides of the project, because I didn't feel the energy to go back through all the optimisations I made. But thanks to Falstad I can do it interactively...

    So I went back to the main diagram and rebuilt the whole thing in Falstad's simulator:

    The source code is so large I can't add it as a link so it's in the file Add8.cjs.

    Slowly, little by little, I can resume the "bubble pushing" that created the nasty bug, but this time I can avoid it :-)

  • An unexpected but welcome tool

    Yann Guidon / YGDES03/05/2020 at 11:37 0 comments

    You must know Falstad's circuitjs simulator, and I've been using it for some weeks for analogue designs. It's not perfect, I have found quite a few quirks, but this is also a logic simulator, not a highly powerful but capable and interactive one !

    It took little time to draw the schematic of the INC8 unit and now I wonder why I wasted so much time doing it with Dia when I could also simulate the result and provide the source code (click here !)

    With a few clicks I was even able to see where I made a mistake in the wiring.

    I have stumbled upon a roadblock with the ALU8 and was feeling lazy to make a deep analysis of my mistakes, Falstad's circuitjs looks like the handy solution to that :-)

    Now, is it possible to convert the netlist (extracted by my new tool) to display it with Falstad ? Or vice versa ?

  • Magnetic interferences

    Yann Guidon / YGDES01/30/2020 at 20:35 2 comments

    It strikes me only now that I must have under-estimated the importance of magnetic interferences between relays...

    I remember seeing placement recommendations for miniature Chinese relays but the РЭC-64 has a tubular shield. How do the openings at both ends behave ?

    Having two relays on a well-spaced prototype breadboard can't show the effects of many relays packed densely and switching with weird patterns.

    I'm starting to consider using mu-metal sheets but I wonder if it's practically effective and the right solution, because I still have some freedom to organise the parts in space and optimise the magnetic field...

    Shoud I start playing with, or even build, a flux-meter ?


    Update :

    Here is the seller for the RES-64 :

    https://www.ebay.com/itm/RELAY-RES-64A-726-REED-SWITCH-9-11V-NOS-USSR-LOT-OF-20PCS/232803573730

    It's a SPST reed relay, you have the two contacts going out of the glass tube available at the opposite ends. The glass tube would be surrounded by the electromagnet coil, the whole is inserted in the metallic tube to further direct the magnetic field and shield a bit from outside influence. I still have to examine a non-working piece to confirm. An additional pin connects the "case" to ground (for example).

    In the expected configuration, the relays will be paired and receive the same current, except during a set/reset pulse.

    I'll have to check and measure the magnetic field at the ends of the relays. That's one excellent reason to finally use all those UGN3503 I bought for another project !

    In the end, working in pairs might solve the problems I imagine so far.

    One way to see it is with both relays forming a magnetic loop, to close the static field. I'll just have to find a way to loop the magnetic field, for example by cutting a torus in half. This ensures that the pair of relays is closely coupled, little energy will leak to the closest neighbours.

    However, I suspect that the real problem is not the static field but the pulsed/forced changes when a capacitor discharges. This is what can affect sensitive neighbour relays but there is a catch : one relay is pulsed with the opposite polarity of the twin relay... In ALL cases, the magnetic pulse will go against the static field of one relay, while also reinforcing the field of the other relay. There must be an opposition of fields somewhere, a magnetic "hot spot" that can interfere with the nearby relays.

    Ideally the programming pulse should have the same polarity for both relays. First it would prevent/limit the cases where one relay has a state different from its twin, in particular during power-on. Second : it would allow the magnetic pulse to be "looped" in a closed magnetic circuit, thus removing many causes of magnetic leakage and interference. The problem is that it would easily double the power drawn by the register set, since there would be 6V to be dropped in resistors... The whole register set would dissipate 2W instead of 1W.

    Yes this log needs more drawings...


    20200327 :

    I just found new information in a totally awesome book dedicated to relays !

    The book is "Electric Relays Principles and Applications" by Vladimir Gurevich and it covers occidental as well as soviet relays. A truly fascinating encyclopedia that turns an apparently dumb device into a marvel of engineering !!!

    Notice the element n°5 : what is a ferroelastic disk ? anyway it might prevent the magnetic field from escaping from one end, which is also great to reduce interferences from neighbouring switching relays...

  • And now, capacitors !

    Yann Guidon / YGDES01/24/2020 at 16:53 0 comments

    The relay-based version of the YGREC-8 was in limbo due to delays in the delivery of required parts. I'm expecting more RES-64 to arrive in a few weeks, after ridiculous back-and-forth between post offices on strike. Meanwhile I was able to progress with #VHDL library for gate-level verification  in amazing ways but... My soldering iron is asking for action !

    Fortunately I received other parts from Russia (thank you eBay !) and I'm listing them to keep track of their intended use.

    Those parts are pretty oversized, compared to today's technology, but the looks/appearance/style is worth it and the whole will be coherent ;-)


    20× PETP K73-16 63V 2.2µF

    These are non-polarised capacitors with medium value.

    They are useful for 2 cases :

    • for CCPBRL: the coupling between stages requires a capacitor but a polarised one forces the use of two shifted power supply domains. Non-polarised capacitors simplify the power supply design, as well as logic design in some corner cases. However, 2.2µF might not cut it for the RES-15...
    • for the high-fanout buffers such as the ones described below:Some simulations with Falstad have shown that a high value would create an oscillation thatcould interfere with the rest of the circuit. A low value however wouldn't transfer enough energy from one side to the other. In both cases, the purpose is to prevent arcing at the contacts of the control relay at the bottom of the drawing. 

    I don't think the YGREC8 needs 20 high-fanout signals but at least I'll be ready. The data memory system requires 5 buffers, the instruction memory might need a few more, but it is a reasonable approximation.

    Of course I'll have to experiment, test, verify, measure... I expect to make another video when it's done :-)

    Verdict : great surprise !
    These capacitors aged very well and maintained excellent isolation as well as precise capacitance : +/- 2% worst case ! I don't know about the inductance but it should work very well.

    20× Inductances Kig 0.1 1000μH

    High-value, low-current inductors

    These parts will "isolate" the various bitplanes from the main power supply.

    Each of the 8 bitplanes contains at least 16 RES-64 to store the values from the register set, but these planes are quite sensitive to external interferences. A "pi" network is used : each bitplane has a local large-value capacitor, added to the large value of the power supply, and the bitplanes can emit and receive pulses that could flip other states...

    The current rating is low but compatible with a single bitplane : each relay uses approx. 2.5ma, a total of 20mA, this gives a 5× margin with this 100mA part.

    Verdict : good !
    8.5-9.1 ohms is a bit much, but the inductance is around 960µH, a few percents of variation.
    It should work well...

    8x Capacitor K50-24 16 V 2200µF

    high-value, medium-voltage power-storage capacitor

    There are 8 of them, just as needed for the 8 bitplanes. Ideally they filter the power to the 16×RES-64. However due to the high capacity and the low current rating for the inductor, there is the risk of blowing up the inductor in the case where the input is (accidentally) shorted. A series germanium diode (or 2 in parallel for higher current capacity and longevity) would prevent the damage.

    Verdict :
    These parts aged but should be "good enough", with a self-resistance around 1M and average capacitance in the 1600-1800µF range. ESR might be high though. They can be for local power supply filtering.

    .

    Update:

    I have "reformed" the capacitors through the slow and long application of current and the leakage has been significantly reduced.

    I applied this method to the larger caps below as well.


    8× K50-16 25 V 5000µF

    These big babies belong obviously in the power supply.

    Not much more to say. It's going to be quite massive but I want to avoid as much ripple as possible.

    There will be several voltages so at least a couple will...

    Read more »

  • Core state machine(s)

    Yann Guidon / YGDES01/12/2020 at 21:07 0 comments

    "Meanwhile..."

    I'm considering writing a quick&dirty&short behavioural simulation of the Y8 core to get a refreshed higher level view (as well as an alternate platform that simulates faster to try code out). I've already written one 2 years ago but it left a lot to be desired. Today's core has a more refined ISA and many details have evolved and matured. And I have a cool assembler I can throw at more challenges!

    The core is still simple, anyway, with "most instructions executing in a single cycle", with 2 exceptions :

    1. Write to PC (except SET and CALL that can write directly to PC and bypass the slow ALU path)
    2. LDCL/LDCH that need one more cycle to read the instruction memory.

    and then you realise you haven't thought about LDCx writing to PC, which makes things even more complex but who would do it ? (well, if 512 instructions of the 64K space allow it, there is 1 chance in 128 it gets executed so it's not negligible).

    A proper FSM is clearly required. And soon, it appears that it's not going to be a nice and tidy FSM like you see in tutorials because I need 2 FSMs.

    • One FSM handles the instructions and the special cases : stop/inst/LDCx/WritePC.
    • Another FSM handles the execution state: start/step/stop/(re)set/load

    These can't be completely joined because they are orthogonal. The execution FSM is what is visible from the outside, particularly during debug sessions. It must also handle internal initialisation when the chip goes out of /RESET (like: load data from external memory). The instruction cycle FSM cares for sequencing single and complex instructions.

    So it makes sense to separate the 2 FSM because it untangles the complex combinations that might arise and prevents naughty bugs and race conditions.


    Before going further, let's remind how the Y8 works : there is no real pipeline, yet 2 phases overlap:

    1. A first phase computes the next address (Increment PC) and/or gets a new address from an instruction, then fetches the instruction memory to read the next instruction.
    2. The second phase gets a ready instruction (or immediate word), then decodes/fetches operands/executes/writesback stuff.

    The FSM must ensure that the instruction memory is fetched before starting the execution phase.

    It gets more complex with the LDCx instructions because another round goes through the instruction memory. And if the destination is PC, yet another cycle is added to fetch the instruction instead of using PC+1.


    OTOH, the core state FSM has to receive orders from outside, latch them and send the appropriate commands to the other FSM. Reset is asserted there, since no "general RESET" signal would be propagated through the whole chip, further simplifying the implementation.

    This FSM receives a code from outside and latches it before interpretation. During debug, it deals with the usual start/step/stop commands, though it defaults to start after power-up. A fourth command can be RESET to force a reload of the program memory (if you change the source code to debug or if INV wants to switch to a different code page).

    So we get the following command codes :

    CommandcodeDescription
    Start11Run the system
    Step10execute one instruction then pause the core.
    Stop01pause the core at the end of the instruction.
    Reset00pause the core and reset FSM
    (reload instructions when another
    command is received)

    The default command is Start when the core wakes up, and it can be modified (from input pins or through the debug interface).

    The FSM can have more states to deal with initialisation. It is clocked by (and synchronised to) the core clock. Its value should be read back by the user after every command to ensure the proper value is latched and the FSM is in a coherent state.

    The five steps (so far) are:

    StateDescriptionHow to get there
    ResetJust do nothing.
    Clear PC and some
    eventual SR flags.
    receive the RESET command
    or external signal.
    Loadcopy external data to
    instruction memory,
    increment PC for each
    new word.
    ext. Reset signal is
    de-asserted...
    Read more »

  • New decoder for the ALU

    Yann Guidon / YGDES01/02/2020 at 04:05 0 comments

    The ALU needs a new decoder because I changed the logic of the ROP2 unit... With a working decoder, I can re-implement the fault checker and validate all the opcodes.


    20200103 : I updated files here and there, as well as the main page that contains the "official definitions", because I have changed the order of the boolean opcodes.

    I must make a new lookup table that is more thorough than the one at Bubble-pushing the ROP2 as well as the previous versions. There is a big difference, this time I use OR instead of MUX2 to combine the data so there are fewer degenerate cases.

           Func   NEG  PASS_en AND_en XOR_en OrXor_en  CLA_en CMPS  WB CryWr
    OR     0000     0      0      0      0       1       0     x    1    0
    XOR    0001     0      0      1      1       1       0     x    1    0
    AND    0010     0      0      0      1       0       0     x    1    0
    ANDN   0011     1      0      0      1       0       0     x    1    0
    CMPU   0100     1      0      1      x       0       1     0    0    1
    CMPS   0101     1      0      1      x       0       1     1    0    1
    SUB    0110     1      0      1      x       0       1     0    1    1
    ADD    0111     0      0      1      x       0       1     0    1    1
    SET    1000     x      1      1      x       0 pass  0     x    1    0
    CALL   1001     x      1      1      x       0 pass  0     x    1    0
    SH/SA  1010     x      0      0      0       0 clear 0     x    1    0
    RO/RC  1011     x      0      0      0       0 clear 0     x    1   b11 (RC)
    LDCL/H 1100     x      1      1      x       0 pass  0     x    1    0
    IN     1101     x      0      0      0       0 clear 0     x    1    0
    OUT    1110     x      0      0      0       0 clear 0     x    0    0
    INV    1111     x      0      0      0       0 clear 0     x    0    0
    

    Some equations :

    CLA_en = F2 & /F3
    ROP2 = /F3 & /F2
    F0F1 = F0 | F1
    /F1F0 = /F1 & F0
    /F1F3 = F3 & /F1
    CMPS = CLA_en & /F1F0
    CarryWrite = CLA_en | opcode=RC
    RegisterWriteback = /F2 | /F1F3 | (/F3 & F1)
    OrXor_en = NOR3( F1, F2, F3)
    XOR_en = F0F1 & ROP2
    NEG = ( ROP2 & F1 & F0) | (CLA_en & F1F0)
    Cin = NEG (more or less but also need to decode ADD 0 cond)
    PASS_en = /F1F3 & /(F2 & F0)
    AND_en = PASS_en | (/F3 & /F1F0) | CLA_en
    

    Maybe a 16×9 bits ROM would be better...

View all 105 project logs

Enjoy this project?

Share

Discussions

salec wrote 10/09/2019 at 09:18 point

YGREC can stand for so many things, but since my wife has been learning French on Duolingo I can't avoid noticing that it is also a wordplay on French spelling of "Y". 

:-)

  Are you sure? yes | no

Yann Guidon / YGDES wrote 10/09/2019 at 10:03 point

oh, of course, yes, too ;-)

  Are you sure? yes | no

salec wrote 10/09/2019 at 12:04 point

always have an opening joke/tease for audience :D

  Are you sure? yes | no

Yann Guidon / YGDES wrote 10/09/2019 at 12:46 point

@salec  always !

  Are you sure? yes | no

castvee8 wrote 04/13/2019 at 22:57 point

I so love your commitment and enthusiasm ! I was playing with vacuum tube calculators a bit since last year an just keep going down the rabbit hole. Your projects seem to at least make purposeful sense.

  Are you sure? yes | no

Yann Guidon / YGDES wrote 04/14/2019 at 08:56 point

That "purposeful sense" may look drowned into the proliferation of projects, angles and ideas but it is still clear to me since it's my main hobby since 1998 at least :-D

I'm glad you enjoy !

  Are you sure? yes | no

Yann Guidon / YGDES wrote 11/04/2018 at 07:11 point

Another note for later :
writing to A1 or A2 starts a fetch from RAM. In theory the latency is the same as instruction memory and one wait state would be introduced. However the processor can also write directly so the wait state would be only on read to the paired data register...

  Are you sure? yes | no

Yann Guidon / YGDES wrote 11/04/2018 at 06:55 point

Note for later : don't forget the transparent latch on the destination register address field, for the (rare) case of LDCx, because the 2nd cycle doesn't preserve the opcode etc.

  Are you sure? yes | no

Yann Guidon / YGDES wrote 11/04/2018 at 07:18 point

OK, not a transparent latch, but a DFF and a mux, plus some logic to control it.

-- DFF, every cycle :

SND_latched <= SND_field;

LDCx_flag <= '1' when (LDCx_flag='0' and opcode=opc_LDC and writeBack_enabled='1')   else '0';

-- MUX2 :

WriteAddress <= SND_latched when LDCx_flag = '1' else SND_field;

______

Note : LDCx into PC must work without wait state because it's connected directly to SRI, as an IMM8, and no extra delay is required. PC wait state is required for ADD/ROP2/SHL and IN.

  Are you sure? yes | no

Frank Buss wrote 10/27/2018 at 12:51 point

Do you really plan 8 byte-wide registers? This would require thousands of relays :-)

  Are you sure? yes | no

Yann Guidon / YGDES wrote 10/27/2018 at 14:26 point

no :-)

8 registers, 8 bits each = 64 storage bits.
1 relay per bit => 64 registers


The trick is to use the hysteretic mode of the relays :-)

  Are you sure? yes | no

Frank Buss wrote 10/27/2018 at 16:17 point

Ok, makes sense. Maybe change the project description, someone might think you are planning a 64 bit architecture.
BTW, could this be parametrized for the address and data size? If you implement it in VHDL, you could use generics for this, would be no additional work to use just the generic names instead of hard coded numbers. Except maybe some work for extending the instruction opcodes.

  Are you sure? yes | no

Yann Guidon / YGDES wrote 10/27/2018 at 17:16 point

Frank : DAMNIT you're right !

I updated the description...

  Are you sure? yes | no

Yann Guidon / YGDES wrote 10/27/2018 at 17:19 point

For the parameterization : it doesn't make sense at this scale. Every fraction of bit counts and must be wisely allocated.

Larger architectures such at #YASEP Yet Another Small Embedded Processor  and #F-CPU  have much more headroom for this.

  Are you sure? yes | no

Bartosz wrote 11/08/2017 at 16:40 point

this will working on epiphany or oHm or other cheap machine?

  Are you sure? yes | no

Yann Guidon / YGDES wrote 11/08/2017 at 18:07 point

I'm preparing a version that would hopefully use less than half of a A3P060 FPGA, which is already the smallest of that family that can reasonably implement a microcontroller.

But it's a lot less fun than making one with hundreds of SPDT relays !

  Are you sure? yes | no

Bartosz wrote 11/14/2017 at 14:13 point

Question is price and posibility to buy

  Are you sure? yes | no

Yann Guidon / YGDES wrote 11/14/2017 at 16:08 point

@Bartosz : what do you want to buy ?

If you can simulate and/or synthesise VHDL, the source code is being developed and available for free, though I can't support all FPGA vendors.

If you want a ready-made FPGA board, that could be made too.

If you want relays, it's a bit more tricky ;-)

I have just enough RES15 to make my project and it might take a long while to succeed. There will be many PCB and other stuff.

However if, in the end, I see strong interest from potential buyers, I might make a cost-reduced version with easily-found minirelays. I don't remember well but the Chinese models I found cost around 1/2$ a piece. Factor in PCB and other costs and you get a very rough price estimate... It's not cheap, it's not power efficient, it's slow and won't compute useful stuff... But it certainly can make a crazy nice interactive display, when coupled with flip dots :-D

So the answer is : "it depends" :-D

  Are you sure? yes | no

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates