Close
0%
0%

YGREC8

A byte-wide stripped-down version of the YGREC16 architecture

Similar projects worth following
YGREC can stand for many things, such as "YG's Relay Electric Computer", "Yann's Germanium and Relay Equipped Computers" or "YG's Ridiculous Electronic Contraption". You decide !

#YGREC16 is getting pretty large and moving away from the original #AMBAP inspiration, making it less likely to be implemented within my lifetime. So here is a "back to minimalism" version with
* 256 bytes of Data RAM (plus parity ?)
* 8 registers, 8 bits each (including PC)
* fewer relays/gates than the YGREC16
This core is so simple that I focus now on other issues, such as the debug/test access port, the register set's structure, I/O, power reduction...
Like the others, it's suitable for implementation with relays, transistors, SSI TTL, FPGA, ASIC, you name it (as long it uses boolean logic)!

After the explorations with #YGREC-РЭС15-bis, I reached several limits and I decided to scale it down as much as possible. And this one will be implemented both with relays and VHDL, since the YGREC8 is a great replacement for Microchip's PICs.

A significant reduction of the register set's size is required so I/O must be managed differently, through specific instructions. The register map is now:

  • D1  <= for NOP
  • A1
  • D2
  • A2
  • R1
  • R2
  • R3
  • PC  <= for INV

The instruction word is shrunk down to 16 bits. It is still reminiscent of the YGREC16 older brother but I had to make clear cuts... The YGREC8 is a 1R1W machine (like x86) instead of the RISCy YGREC16, to remove one field. Speed should be decent, with a pretty short critical datapath, and all the instructions execute in one clock cycle (except the LDCx instructions and computed writes to PC).

The fields have evolved with time (I have tried various locations and sizes). For example:

20171116: The latest evolution of the instruction format has added a 9-bits immediate field address for the I/O instructions.
20180112: Imm9 is now removed again...
20181024: changed the names of some fields
20181101: modified the conditions to change Imm3 into Imm4
20180112: Imm9 back again !

There are 18 useful opcodes (plus INV, and the pseudo-opcodes HLT and NOP), and most share two instruction forms : either an IMM8 field, or a source & condition field. The source field can be a register or a short immediate field (4 bits only but essential for conditional short jumps or increments/decrements).

The main opcode field has 4 bits and the following values:

Logic group :

  • OR
  • XOR
  • AND
  • ANDN

Arithmetic group:

  • CMPU
  • CMPS
  • SUB
  • ADD

Beware : There is no point to ADD 0, so ADD with short immediate (Imm4) will skip the value 0 and the range is now from -8 to -1 and +1 to +8. (see 17. Basic assembly programming idioms)

Shift group (optional)

  • SH/SA direction is sign of shift, I/R(bit9) is Logic/Arithmetic flag.
  • RO/RC direction is sign of shift, I/R(bit 9) allows carry to be rotated.

Control group:

The COND field has 3 bits (for Imm4) or 4 bits, more than YGREC16, so we can add more direct binary input signals. CALL is moved to the opcodes so one more code is available. All conditions can be negated so we have :

  • Always
  • Z (Zero, all bits cleared)
  • C (Carry)
  • S (Sign, MSB)
  • B0, B1, B2, B3 (for register-register form, we can select 4 bits to test from user-defined sources)

Instruction code 0000h should map to NOP, and the NEVER condition, hence ALWAYS is coded as 1.

Instruction code FFFFh should map to INV, which traps or reboots the CPU (through the overlay mechanism): condition is implicitly ALWAYS because it's a IMM8 format.

Overall, it's still orthogonal and very simple to decode, despite the added complexity of dealing with 1R1W code.


This project is more than an ISA or one implementation : the goal is to become a platform. See log 82. Project organisation

Logs:
1. Honey, I forgot the MOV
2. Small progress
3. Breakpoints !
4. The YGREC debug system
5. YGREC in VHDL, ALU redesign
6. ALU in VHDL, day 2
7. Programming the YGREC8
8. And a shifter, and a register set...
9. I/O registers
10. Timer(s)
11. Structure update
12. Instruction cycle counter
13. First synthesis
14. Coloration syntaxique pour Nano
15. Assembly language and syntax
16. Inspect and control the core
17. Basic assembly programming idioms
18. Constant tables in program space
19. Trap/Interrupt vector table
20. Automated upload of overlays into program memory
21. Making room for another instruction
22. Opcode map
23. Sequencing the core
24. Synchronous Serial Debugging
25. MUX trees
26. Flags, PC and IO ports
27. Binary translation (updated)
28. Even better register set
29. A better relay-based MUX64
30. Register set again
31. Rename that...

Read more »

YGREC8_VHDL.20200825.tgz

some syntax highlighting, new SR FF macros

x-compressed-tar - 351.15 kB - 08/25/2020 at 04:42

Download

YGREC8_VHDL.20200821.tgz

TAP's Selector unit OK

x-compressed-tar - 349.16 kB - 08/22/2020 at 00:52

Download

YGREC8_VHDL.20200815.tgz

TAP bit scrambling order is working again

application/x-compressed-tar - 301.32 kB - 08/16/2020 at 01:53

Download

YGREC8_VHDL.20200814.tgz

Back to Gray6s...

x-compressed-tar - 297.88 kB - 08/15/2020 at 04:45

Download

x-compressed-tar - 278.49 kB - 08/13/2020 at 03:24

Download

View all 52 files

  • A tale of Flip-Flops

    Yann Guidon / YGDES08/23/2020 at 02:42 0 comments

    The log 120. TAP v.2 introduced the basic feature of the new circuit : the same counter is used for both phases (in and out) of the communication cycle. This means that it can't use a "normal" counter with its RESET pin and the circuit has been redesigned with partially asynchronous features. The result is this circuit :

    But in practice, this creates some issues in simulation, both with VHDL and circuitjs. The culprit is the infamous double-NOR2 latch that creates oscillations at the start of the sim when both inputs are cleared.

    With circuitjs the oscillation stops after "a while", and I have no idea why :

    But with GHDL the sim stops before it begins because it can't reach a stable initial state.

    And now look at this version:

    This one has no inversion so no unstable state and no oscillation. This circuit is less favoured because the AND and OR are "2nd order gates" and each requires an additional inverter. But in CMOS there is an extra trick !

    The gate OAN21 (or OA1 in Actel parlance) uses 8 transistors : 6 for the function itself and 2 for the final inverter. One inverter stage is saved thanks to the nested topology :

    Meanwhile, each NOR2 (or NAND2) already takes 4 transistors, for a total of 8 transistors for the flip-flop as well, but with possible oscillations. The choice seems obvious in this case.

    There is another difference though : the CLR signal's polarity is negated. Usually it is not a problem because the signal can be be negated in a way or another upstream. In the above example, I simply tied the XOR to the inverted output of the DFF but other solutions exist, such as a XNOR.

    And there is also a more direct solution, with the OA21NB gate (or OA1B for Actel) where a little swap trick saves another inverter :

    So the Gray6s unit can be updated.


    This raises a big question though. R/S flip-flops are used in other places (such as the Selector) and this "bare metal" construct can't be analysed by my library tools because logical loops are explicitly considered as "zombies". They must be explicitly abstracted and this means I can't just replace the 2×NOR2 with one OA1B and call it a day.

    Let's look at the A3P vocabulary : there are no RS FF but it contains these T-latch gates

    "DLI0    ", "DLI1    ", "DLN0    ", "DLN1    ",
    "DLI0C0  ", "DLI1C0  ", "DLN0C0  ", "DLN1C0  ",
    "DLI0C1  ", "DLI1C1  ", "DLN0C1  ", "DLN1C1  ",
    "DLI0P0  ", "DLI1P0  ", "DLN0P0  ", "DLN1P0  ",
    "DLI0P1  ", "DLI1P1  ", "DLN0P1  ", "DLN1P1  ",
    "DLI0P1C1", "DLI1P1C1", "DLN0P1C1", "DLN1P1C1"
    

     This looks confusing but there is some kind of logic in this madness. C means Clear and P means Preset, the following digit gives the active level.

    • We can already forget the first line because these gates have only a data and clock input.
    • Other gates with either C0/C1 or P0/P1 are more useful when a fixed datum is input and the extra pin forces the negative state. There are 16 cases and many degenerate ones...
    • The last line with both C and P is interesting because it gives both active-1 Clear and Preset as well as an inverted output (if needed).

    There are many ways to build a Set/Reset or Reset/Set gate from these macros but we're not there yet. So let's now enumerate the desirable cases and build a table of the required conditions !

    PrecedenceSet
    level
    Reset
    level
    Macro
    name
    Mapped to
    Set00S0R0AO1A
    Set01S0R1AO1C
    Set10S1R0AO1, AON21
    Set11S1R1AO1B, DLI1P1C1, AON21B
    Reset00R0S0OA1A
    Reset01R1S0OA1C
    Reset10R0S1OA1, OAN21
    Reset11R1S1OA1B, DLN1P1C1, OAN21B

    The precedence is given by the gate that drives the output (AND for Reset, OR for Set). Conveniently the wsclib013 library also provides the reciprocal gates AON21 and AON21B so 4 combinations are directly available in ASIC (if the B or a2 input is fed from the output).

    Other cases could be built from the inverting versions (AOIxx, OAIxx, exercise left to the needy user).

    This is cool but this is still not the end of the story because I still can't analyse these constructs. One way is to reuse the...

    Read more »

  • TAP pins

    Yann Guidon / YGDES08/22/2020 at 23:03 0 comments

    Let's now talk about the physical interface to the TAP :-)

    As already mentioned, this TAP is designed to

    1. interface directly to a SPI master interface
    2. require as few pins as possible
    3. be as simple as possible so the host SW can do all the smart stuff

    The typical application uses something like the Raspberry Pi and a half-duplex link with 3 signal wires :

    • /WR input selects the data direction (0: write, 1 read from the core)
    • Dat : multiplexed serial data in and out signal
    • CLK input

    The debug system could also access the /RESET pin but this should not be necessary.

    /WR, CLK and Dat should have a weak pull-down resistor to prevent any spurious activity when the debugger is not connected.

    However the state at these pins could be sampled by the Y8 during Power-On Reset (P.O.R.) to control the state when /RESET is released.

    • /WR should not be high on its own because any unsuspecting device connected to the Dat pin could also be a driver and result in a "driver conflict". Thus /WR high would mean a debugger is actively connected and requesting to take over the pins.
    • CLK is also usually low but the counter has "a condition" where the synchroniser oscillates during initialisation. Oscillations are removed when CLK is high at startup and it has no noticeable effect on the initialisation sequence.
    • Dat should be weakly pulled to 0 because you never know what state would be output when /WR goes high, and a string of 0s is/will be a command for internal TAP reset.

    The TAP interface could be implemented in a familiar 4-wires way, as a traditional SPI slave :

    But since Din and Dout are not active simultaneously, a little tristate buffer circuit can save one pin on the TAP's side :

    Ideally, with a microcontroller for example, the /WR signal is driven by a GPIO pin. However the Raspberry Pi has "some latency" (around one  microsecond in some cases) which could be saved with another trick : use both SPI slave select pins, yet use only one CE pin. This also helps when dealing with the proper polarity, since /WR is active on both levels but Slave Select is usually active low.

    Multiple TAPs could be accessed with the same port because the interface requires both /WR and CLK to work. If one of them, or both, is disabled (with a demultiplexer, 74HC238 or 239 as in the example below), commands can not be interpreted.

    If /WR can go high for one TAP only, then the other TAPs will float their Dat pin, which implements the desired multiplexing. If CLK is not demultiplexed as in the above example, make sure to start all the new communications with a blank byte, which ensures that all the other TAPs will not interpret the data as valid commands.

  • TAP v.2's selector

    Yann Guidon / YGDES08/17/2020 at 16:33 0 comments

    You might remember the Selector unit from the log 114. The TAP selector: 19 Flip-Flops ! Of course these gates were used and shared but that looks a bit excessive... Using a "prefix approach" lets me reuse the gates so one set of 5 DFF is enough to store the group select prefix and the unit-specific command.

    Once the "Group Select Prefix" is received, it must be decoded. As with the first version, only codes 001 to 110 are decoded so there is an expansion rate of 2 : going from 3 input bits to 6 output bits. It would then make sense to latch the 3 encoded bits but there are other considerations : if only 2 or 3 units are implemented, it doesn't make a difference. For now I favour the approach where the decoded signal is latched, so it could be located close to, or inside the addressed unit.

    But logically, the new Selector unit has inputs from the pins and the Gray6s counter, and outputs a latched "Group Select Prefix" vector with a desired width. I tried to test the circuit with Falstad's circuitjs and saw that I couldn't go far without having the whole Gray6s circuit where I could tap in some already existing signals. The test circuit is quite large (and I must use minified links now) but only because of the Gray6s unit :-)

    I succeeded to turn the selected output ON but some questions remain, as seen as the red-encircled areas.

    • On the left : the original circuit expected the clock to go low before /WR changes.
    • On the right : can this circuit be reduced to a simpler Set/Reset circuit ? (There are some spurious signals that seem to prevent it)

    Anyway it is looking smaller than the first Selector and the decoding gates can be easily reused by the addressed units :-)


    Well the problem on the left is solved with a MUX2 / T-latch :-)

    There is no need to connect the clock gate to the above Set/Reset latch.

    However : this works ONLY if the clock level after and before the /WR change is the same. This still creates a spurious spike when

    • CLK = 0 when /WR goes up and
    • CLK = 1 when /WR goes down

    But this should not happen, right ?


    The new circuit is here :-)

    This helps to define the modifications required for the Gray6s unit :

    • Less4 is now NOR3 that counts /WR as input.
    • nobyte and SelB3 can be done externally, in the Selector
    • SelB3 needs OV(1)

    The output latch can indeed be reduced to a Set/Reset cell, here made with a OA1 but a pair of NOR2 could work as well. The rationale for this choice :

    • it takes much less silicon
    • The spurious pulse comes after the first normal pulse due to an interaction with the clock. If the latch is already set, there is no harm to set it again one cycle later (it doesn't change the state).

    The state change as soon as the 3rd clock pulse is received, which could create issues in the other units. I'm testing if/how I can activate it on the falling edge of CLK.

    And here is the Selector with 5 decoded outputs:

    I know I should have factored the decoder's inverters but it's only an illustration that will be refined.

    The duplicated NOR3s with SelB3 and CLK should also get a special treatment to reduce the logic and wiring... So there it is !

    The predecoders are also shared with the rest of the TAP to reduce efforts&complexity. These predecoders are also provided for the other bits of the shift register :-)


    Speaking of predecoders : another signal that the other circuits will need is when the first byte is shifted in. I modified the existing circuit by adding 2 gates :

    • "firstbyte" is much like nobyte, but instead of checking 3 Gray bits for '0' (which gives a granularity of 4 counts), the  DFFa(2) signal is traded for the CLK input to remove the glitch further downstream (and the granularity is 8 counts).
    • CMD_en goes high at the end of the 7th clock pulse, ready for being latched in the 8th pulse. It's a bit like FB (Full Byte) but restricted to the first byte only.

    preFB is glitchy because it uses logic results from the counter, but it seems that gating...

    Read more »

  • TAP timing & simulation

    Yann Guidon / YGDES08/16/2020 at 01:58 0 comments

    As I rebuild the tests for the latest archive YGREC8_VHDL.20200815.tgz, I find a weird behaviour : the last test of the TAP takes what feels like ages to complete.

    The usual procedure, after you decompress the archive, is to run the script ./test_units.sh which will ensure everything is OK (tools, files, regression tests and so on). Self-tests are run, one after another, and I don't use a makefile because it tends to obfuscate things. If I wanted to use all the CPU I could parallelise some parts but for now, I don't run ALL the possible tests, often because they are pointless. There are various versions of the same unit and some are not relevant.

    For the TAP, I have the "behavioural" version of the Gray6s counter, and the "tiles" version, both work as intended. The test.sh script works nicely, with behavioural completing in no time, while the Tiles takes 1/2s. But it appears that the Tiles version has some unexpected side-effects when used along with the MUX64 and I can't find why. test_readback takes more than 7s to complete ! After tweaking the order of the initialisation lines, that time is halved but the 3.5s is still too much for doing nothing (I tried to find where/how that time was wasted but it still eludes me). I want the whole script to run fast and smoothly, I'm not running an exhaustive test but thorough enough that one can be confident to explore the rest of the options with a working system.

    For now the solution is to run the piso_order and test_readback programs with the behavioural counter. I suspect is could be an obscure VHDL thing with the "discrete" R/S latch of the resynch circuit but I can't be sure yet. Anyway the purpose is to run decently fast simulations and the Tiles version is not required at that point, as long as the behaviour matches with the higher-level description.


    Apparently the gates library adds some significant overhead when the number of gates increases. The TAP simulates faster when the "simple" build is chosen, without the fancy stats stuff. However regeneration of the library would take almost as much time as running the system with its about 100 gates. A tiles-less version is chosen for the TAP's first tests, but other options are possible (commented out in the scripts). Behavioural simulation is fast enough anyway in the first checks. More detailed simulations can be performed later...

    The archive is re-uploaded and the ./test_units.sh script completes in < 11s.


    Oh, it just dawns on me that I could compile the "simple" gates library in a different directory, with the same name, and select the version by the directory path... This is now explained in Abnormal initialisation time and workaround. and the archive is updated with the new "dual style" system.

  • TAP v.2 : where it's going

    Yann Guidon / YGDES08/13/2020 at 22:16 0 comments

    The MUX64 is unchanged, the Gray7s circuit is redesigned, the Instruction Slice remains quite the same... The new general diagram shows that we're left with the redesign of the Execute module :

    There is one gotcha though, compared to the behaviour of the previous versions :

    If you read the MUX beyond the 64 bits, the Gray counter will loop and wrap around in the reverse direction, before going in the forward direction again, and so on...

    I might have to reduce the size again because it seeeeeems I might not use the whole counter range. See below.


    The timing diagrams have not changed much because the TAP is still driven by the same type of devices (microcontrollers or SBC with a hardwired byte-wide SPI master). So sending a single byte will look like this :

    There is a required "little dance" on /WR and CLK after powerup to ensure that the internal state is determined:

    Strobe CLK up-down (at least once, could be 8 times), then /WR up-down, and you can read and write as expected (just make sure the serial pin is not driven when /WR is up). Other registers will need to be initialised as well, as there is no real "reset" pin.


    There is a significant change on the horizon though. I'd like to use a prefix (byte ?) to also simplify the Selector (it has disappeared on the main diagram). The first byte could be a "short command" to the FSM for example, but for longer messages it also steers the data to the right shift register. Some clock resynch is required for these sub-clocks to prevent a spurious rising edge but I think I found a solution tonight.

    Furthermore I have split the first byte into the prefix and the command. This saves some gates because the prefix is latched first, then the command can reuse the same DFF gates. With 3 bits of prefix and 5 bits of command, only 5 bits need to be stored by the new "selector". Furthermore, since decoding logic is shared between the prefix and the command, some further gates could potentially be saved. Then, only 6 units can be selected because prefixes 000 and 111 are reserved (it can reuse the same decoding logic as the previous selector). To save further on gates, each unit will contain its own selector latch and logic.

    I like it because that is one internal state fewer to init and/or keep in mind when programming. The command is the state and the system needs fewer cycles to get into a nominal functioning state (and it uses fewer gates). It is less resilient though and the commands must be carefully documented because there is no ASCII mnemonic.

    The other benefit is with the timing : the serial interface generates the pulses that were hard to come up with the first version. The /WR pulse can be used to strobe transparent latches, for example.

    As a result : most of the commands that were defined so far in 116. TAP summary & protocol will be either 1 or 3 bytes long...

  • Updated Gray Counter

    Yann Guidon / YGDES08/10/2020 at 10:42 0 comments

    20200814 : back to 6 bits because it looks unlikely I'll use 16 byte-long messages. Look at this circuit :-)

    Total count : 39 gates :-)
    The rest of this log is still very informative.

    With the new TAP v.2, I reconsider the detailed design of the whole circuit and merge the two counters into one. This means that I must remove the /RESET input of the DFFs, which in fact are not desired because basic ASIC gates don't have one anyway. I must also increase the size of the counter a bit and add a SAT output (plus some pre-decoded bits such as FB or NULL). With these enhancements, the same counter can drive both the MUX tree for Dout and the other decoders for Din.

    The log 109. Gray counter explains all the details of the construction of a modular/cascaded Gray counter, check it out if you haven't seen already !

    From there the first step is to expand the counter to 7 bits and add a saturation bit :

    Then the DFF with RESET must be substituted with a DFF and a AND2 gate.

    Let's start with the MSB : SAT and B6

    The funny thing is that the AND and XOR gates can be understood as a half adder, because B6 is toggled every time OV is on, and SAT is enabled when both OV and DIR are on. This could help simplify a bit if a "half adder" gate is available in the ASIC PDK but H2 seems best merged with the following OR2.

    The DFFs have no RESET input as expected. The SAT output could even drop the DFF but it would be ON during cycle 127 and not 128, thus reducing the usefulness of the whole circuit. The DFF delays the flag by one cycle and allows the use of the full 16 counts.

    The middle module(s) have nothing specific to be said about...

    I simply added the AND2 at the output of the DFF and removed the RESET pin.

    Same for the the LSB : it's a simple adaptation.

    .The modules are gathered in this link so they can be reused and adapted later for other eventual purposes.

    I hope it will be useful to others ;-)

    The whole counter is there : it looks like such a mess that I'm glad it's modular ;-)

    And it works nicely when driven by the circuit described in 120. TAP v.2 :

    (of course this is not the typical way to use it but it works anyway)

    Now, writing it in VHDL is another story.

    Stay tuned !


    Oh I almost forgot ! The earlier Counter unit has a FB output that is needed in most of the other circuits. It turns out it's quite easy to generate but not as I originally thought : just AND the DIR and OV signals from the LSB module. The circuitjs diagram shows that the result of the AND is a bit glitchy on the the 4th (mod 8) cycle but the DFF resynchs the signal. The AND result is provided as a partially decoded signal preFB, in case it's needed in other places...

    The trick is to ensure that the /WR toggles work as intended, there is a AND after the DFF, and there is no need to AND before because the OV and DIR signals are already ANDed anyway.

    It is also useful to provide pre-decoded flags for when the byte count is low. I added the Less4 output signal that is a NOR3 of SAT, b6 and b5, such that it is 1 when the count is less than 4 bytes.

    As the circuit has grown beyond the linking capacity of the site, I saved the description as Gray7s-fb-l4.cjs in the archive.

    The whole thing is pretty large, now...


    VHDL implementation was not difficult, thanks to the previous version and all the planning that is logged on these pages. It compiled (almost) right away and thanks to rigorous checks during the writing, only one small numbering mistake remained and was easily spotted.

    Total count : 46 gates (incl. resynch, sat, FB), while the earlier Counter was 31 and Gray6 was 21 (and with special DFF with Reset). So the net gain is 6 gates but there are 16 DFF that have been replaced with a smaller version without RESET.

    This new version will greatly ease the design of the other modules !

  • Synthesis checks

    Yann Guidon / YGDES08/07/2020 at 22:09 0 comments

    I tried to run my new code through Synplify (in the Libero SOC suite) and got some interesting results.

    First :

    I finally understand how to create and use external libraries, in particular the SLV lib worked right out of the box, after I searched for the right method. It's some of those dumb painful GUI clickodrome that looks nice during a presentation but is not possible to automate... Anyway, SLV_utils.vhdl was added smoothly.

    Second :

    I forgot an important "detail" about how Synplify wants its external entities : "old style"... So I had to adapt/modify a lot of lines. Nothing changed except the syntax. It's more verbose, you have to add a declaration for each block you use... But now it works.

    Third :

    I could check, verify and compare the behaviour of the synthesiser with various versions of one unit.

    In particular I verified that the "balanced control tree" approach is beneficial compared to the dumb/usual approach. Log 25. MUX trees gets a graphical update :-)

    Oh and I found how to manually place & lock gates, so here is one test with INC8 :-)

    The system was not able to optimise this unit more so I guess I'm not far from a great design.

    Now, I just have to find how to generate these coordinates with a program and send them to the tool...

    Finally :

    All the modified and/or tested VHDL files have been re-integrated into the code tree with the following line :

    -- SYNTHESIS OK

    So it's easy to check/list all the final files with grep, and separate them from the simulation-only files :-)

    I have gained more insight, refreshed my skills and proved that my method works.



  • TAP v.2

    Yann Guidon / YGDES08/04/2020 at 22:20 0 comments

    As I am near completion the design of the TAP system, I realise I have harder and harder timing problems to solve... And I could even save some gates !

    The Counter has 32 gates. The Gray counter has 17. Both count on CLK's rising edge and mayyyybe... they could be merged ?

    It's not difficult to adapt the Gray counter to provide the additional signals FB (Full Byte : just a NOR3) and SAT (an added DFF and a couple of gates), as well as individual decoded size output signals for 1 to 8 bytes (though so far only 2 and 4 are used). Overall, the Gray counter would expand to maybe 30 gates, which would overall save maybe 20 gates compared to the split design, and the DFF would not have a RESET input, which might be absent in ASIC gates libraries (and use more silicon)...

    The harder part though is the transition between the 2 phases and the reset of the counters. I think I have an idea but it will force me to deconstruct the Gray counter into a more traditional logic+DFF system, because... it will become a classic digital sequential circuit, with the current state and the expected new value that may (or may not) be latched.


    20200808 :

    Some more thoughts gave this result :

    This easily solves the question of using the counter with BOTH phases, at the cost of 1×DFF, 1×XNOR and 1×AND2 per counter DFF.

    Here is how it shoul look with wavedrom :

    This does not solve however the case where /WR is toggled up and down without CLK activity. The DIFF internal signal should be "sticky" and go back down when CLK has a rising edge...

    There, it should work now :-) (ok it doesn't because of a race condition with the clock)

    Each time /WR changes, the added DFF is RESET, and later set again by a positive edge on CLK. The question now is how to emulate this DFF with individual gates. The following circuit seems to work well and is adapted to ASIC implementation :

    The two NOR2 use little surface on a die. The output inverter works as a buffer. There is no oscillation condition, as proved in the above trace : the SET has precedence over CLR, which avoids the race condition found with the initial idea using a DFF. The initial value of the latch is determined by toggling the /WR and CLK pins and a short initialisation sequence brings the circuit to a know state:

    • bring /WR low
    • pulse CLK once (at least, could be more) => first DFF state is known
    • set /WR high => changes XOR => CLR the SR latch
    • pulse CLK once (at least, could be more) => SET the SR latch

    The output data can be ignored, shift 0s in to make NOPs (just in case). So this could be summed up as : shift a NUL byte in, then shift a dummy byte out.
    Here is the new version :

    Note : this works ideally when the CLK input is LOW when /WR changes. However : to create a rising edge, CLK must go down before going up again, this half-clock phase (when low) will be the "clear state". Ensure this period is long enough and that the CLK state is appropriate (check on the 'scope to be sure !!! I'm looking at you, Raspberry Pi...). This is not critical while shifting bits in, however it is a delicate thing to ensure when reading from the TAP (in particular, the first "volatile bit").

    This is solved by changing the precedence of the RS latch to RESET/DIFF/WR, as in this updated circuit :

    And the extended chronogram :

    Now CLK can go low before or after /WR changes.

    Just by changing where a wire is connected.

  • The TAP crosses 3 clock domains !

    Yann Guidon / YGDES08/03/2020 at 17:52 0 comments

    The eXecute module of the TAP connects one domain with no RESET but 2 clocks, to another with one RESET and one clock. This makes it more complex than the others, as hinted by the end of the previous log The TAP's eXecute module.

    • On the TAP side : CLK and /WR are two sources of clocking. CLK goes to the counter and the shift registers, /WR goes to the decode logic that takes the control at the end of each message.
      CLK would not go very fast : 10MHz is reasonable (wires and other external effects would probably disturb the signal) and leaves 100ns between consecutive rising clock edges.
      There MUST be a reasonable margin (100ns ?) between the last rising clock edge and the rising edge of /WR.
      There is no RESET for several reasons :
      • The TAP must be able to work while the rest is in /RESET
      • Adding a TAP-specific RESET pin would increase the external footprint and wiring
      • The TAP can control /RESET from the inside
      • Routing another /RESET could burden the rest of the chip
      • By design, the TAP will work with the proper init sequence. (JTAG can also work without /RESET pin)
    • The core has a free-running clock, as well as a HW /RESET external signal.
      The clock could be as slow or fast as one wants, or even weird...
      The /RESET can also be overtaken by the TAP.

    .

    .

    For the communication with the FSM, the signal goes through two DFF as shown below :

    If the FSM clock is fast enough, the OR can be removed but... you're never too sure ! For example going from RESET to START triggers the reload of the instruction memory, which can take 4K cycles at least.

    The first DFF triggers on /WR going up, which is the necessary condition to detect the end of the message, or else the "valid" address could be trigered by enough random data flowing through the shift register. The asynchronous RESET allows the crossing of clock domains, and the clearing always trails the setting by at least one FSM clock cycle, as delayed by the next DFF.

    The DFF on the right also re-synchronises the input data so it is valid at the start of each FSM clock cycle. Otherwise the data could arrive late in the cycle and create race conditions and invalid boolean calculations.

    .

    .

  • The TAP's eXecute module

    Yann Guidon / YGDES08/01/2020 at 12:43 0 comments

    The previous modules are quite simple, easy, self-contained, while the X command (described earlier) subtly touches more things at once.

    Talking to the Instruction slice is not very hard, but requires some decoding first, and some of it would be best shared with the Selector. The "addresses" 'S' and 'X' are very close and and this would save some gates.

    I think it's the perfect time to talk about how I mapped the S-decoder to gates :-)

    It started easy enough for the 'S' condition :

    valid <= '1' when  SRi( 7 downto 0)="01010011" -- signature
            and SRi(15 downto 11)="00110"   -- command : MSB select ASCII chars '0'-'7'
            and SAT='0' and W='0' and J1='1' and J0='0' --    else '0';

    Then another simple step is to sort the '0' and '1' to put them in two separate equations, one with AND for the '1's and the '0's are gathered with a big NOR :

    norx <= not (SAT or W or J0 or SRi(15) or SRi(14) or SRi(11)
                 or SRi(7)  or SRi(5)  or SRi(3)  or SRi(2));
    valid <=  SRi(13) and SRi(12) and SRi(6) and SRi(4)
                  and SRi(1) and SRi(0) and J1 and norx;
    

     Then it's easy to group the ORs and ANDs together into 3-inputs gates. And when there are not enough inputs for the AND gates, they can be used to input the result of the NORs :-)

    Finally, bubble-pushing can transform two consecutive ANDs into a NAND followed by a NOR.

    So let's do this all over again but this time the X condition is also decoded so some gates are common.

    S <= '1' when SRi(7 downto 0)="01010011" and SRi(15 downto 11)="00110"
                        and SAT='0' and W='0' and J1='1' and J0='0'
       else '0';
    X <= '1' when SRi(7 downto 0)="01011000"
                        and SAT='0' and W='0' and J2='1' and J1='0'
       else '0';

    The common terms are

    COM <= '1' when SRi(7 downto 4)="0101" and SRi(2)='0' and SAT='0' and W='0'
      else '0';

    and S and X can be taken separatey :

    S <= '1' when SRi(3)='0' and SRi(1 downto 0)="11" and SRi(15 downto 11)="00110"
                        and J1='1' and J0='0'
       else '0';
    X <= '1' when SRi(3)='1' and SRi(1 downto 0)="00" and J2='1' and J1='0'
       else '0';

    Now these 3 can be checked in parallel, let's separate their bits according to their value.

    X <= SRi(3) and J2 and
           not ( SRi(1) or SRi(0) or J1); -- nice fit for this one !
    S <= SRi(1) and SRi(0) and SRi(13) and SRi(12) and J1 and
           not (SRi(3) or J0 or SRi(15) or SRi(14) or SRi(11));
    COM <= SRi(6) and SRi(4) and
           not (SAT or W or SRi(2) or SRi(7) or SRi(5));

    From there the gates are easy to cluster and bubble-push.

    The result is 11 gates, the speed is not striking but 4 or 5 gates of latency shouldn't be limiting for this slow circuit and it is only 2 more gates than the previous circuit.

       sa: entity  OR3 port map(A=>SRi(15), B=>SRi(14), C=>SRi(11), Y=>tSo  );
       sb: entity NOR3 port map(A=>J1     , B=>SRi( 3), C=>tSo    , Y=>tSn  );
       sc: entity AND3 port map(A=>SRi(13), B=>SRi(12), C=>tSn    , Y=>S2   );
       sd: entity AND3 port map(A=>SRi( 1), B=>SRi( 0), C=>J0     , Y=>S1   );
    
       c1: entity  OR3 port map(A=>SRi(2) , B=>SRi( 7), C=>SRi( 5), Y=>Co1  );
       c2: entity NOR3 port map(A=>SAT    , B=>W      , C=>Co1    , Y=>Co2  );
       co: entity AND3 port map(A=>SRi(6) , B=>SRi( 4), C=>Co2    , Y=>COM  );
    
       x1: entity NOR3 port map(A=>SRi(1) , B=>SRi( 0), C=>J2     , Y=>tXo  );
       x2: entity AND3 port map(A=>tXo    , B=>SRi( 3), C=>J1     , Y=>tX   );
    
      vx:  entity AND3 port map(A=>tX     , B=>COM    , C=>FB     , Y=>X    );
      vld: entity AND3 port map(A=>S1     , B=>COM    , C=>S2     , Y=>valid);
    

    (one thing I dislike about VHDL is the requirement to label ALL the instantiated entities, it really gets nasty fast).


    OK !

    • Now, the Selector decodes the execute address with only 2 gates of overhead.
    • The clock to the slice is only gated by /WR, already done by the Selector.
    • The data to the slice shift register comes directly from the Selector as well (the MSB of the Command bus)

    But the slice requires more than these signals and the FSM is an even tougher beast... Let's just focus on the control of the slice :

    • Imux : the source of the instruction is selected by the current command (STEPX, NOPX ?) which requires some decoding....
    Read more »

View all 127 project logs

Enjoy this project?

Share

Discussions

salec wrote 10/09/2019 at 09:18 point

YGREC can stand for so many things, but since my wife has been learning French on Duolingo I can't avoid noticing that it is also a wordplay on French spelling of "Y". 

:-)

  Are you sure? yes | no

Yann Guidon / YGDES wrote 10/09/2019 at 10:03 point

oh, of course, yes, too ;-)

  Are you sure? yes | no

salec wrote 10/09/2019 at 12:04 point

always have an opening joke/tease for audience :D

  Are you sure? yes | no

Yann Guidon / YGDES wrote 10/09/2019 at 12:46 point

@salec  always !

  Are you sure? yes | no

[deleted]

[this comment has been deleted]

Yann Guidon / YGDES wrote 04/14/2019 at 08:56 point

That "purposeful sense" may look drowned into the proliferation of projects, angles and ideas but it is still clear to me since it's my main hobby since 1998 at least :-D

I'm glad you enjoy !

  Are you sure? yes | no

Yann Guidon / YGDES wrote 11/04/2018 at 07:11 point

Another note for later :
writing to A1 or A2 starts a fetch from RAM. In theory the latency is the same as instruction memory and one wait state would be introduced. However the processor can also write directly so the wait state would be only on read to the paired data register...

  Are you sure? yes | no

Yann Guidon / YGDES wrote 11/04/2018 at 06:55 point

Note for later : don't forget the transparent latch on the destination register address field, for the (rare) case of LDCx, because the 2nd cycle doesn't preserve the opcode etc.

  Are you sure? yes | no

Yann Guidon / YGDES wrote 11/04/2018 at 07:18 point

OK, not a transparent latch, but a DFF and a mux, plus some logic to control it.

-- DFF, every cycle :

SND_latched <= SND_field;

LDCx_flag <= '1' when (LDCx_flag='0' and opcode=opc_LDC and writeBack_enabled='1')   else '0';

-- MUX2 :

WriteAddress <= SND_latched when LDCx_flag = '1' else SND_field;

______

Note : LDCx into PC must work without wait state because it's connected directly to SRI, as an IMM8, and no extra delay is required. PC wait state is required for ADD/ROP2/SHL and IN.

  Are you sure? yes | no

Frank Buss wrote 10/27/2018 at 12:51 point

Do you really plan 8 byte-wide registers? This would require thousands of relays :-)

  Are you sure? yes | no

Yann Guidon / YGDES wrote 10/27/2018 at 14:26 point

no :-)

8 registers, 8 bits each = 64 storage bits.
1 relay per bit => 64 registers


The trick is to use the hysteretic mode of the relays :-)

  Are you sure? yes | no

Frank Buss wrote 10/27/2018 at 16:17 point

Ok, makes sense. Maybe change the project description, someone might think you are planning a 64 bit architecture.
BTW, could this be parametrized for the address and data size? If you implement it in VHDL, you could use generics for this, would be no additional work to use just the generic names instead of hard coded numbers. Except maybe some work for extending the instruction opcodes.

  Are you sure? yes | no

Yann Guidon / YGDES wrote 10/27/2018 at 17:16 point

Frank : DAMNIT you're right !

I updated the description...

  Are you sure? yes | no

Yann Guidon / YGDES wrote 10/27/2018 at 17:19 point

For the parameterization : it doesn't make sense at this scale. Every fraction of bit counts and must be wisely allocated.

Larger architectures such at #YASEP Yet Another Small Embedded Processor  and #F-CPU  have much more headroom for this.

  Are you sure? yes | no

Bartosz wrote 11/08/2017 at 16:40 point

this will working on epiphany or oHm or other cheap machine?

  Are you sure? yes | no

Yann Guidon / YGDES wrote 11/08/2017 at 18:07 point

I'm preparing a version that would hopefully use less than half of a A3P060 FPGA, which is already the smallest of that family that can reasonably implement a microcontroller.

But it's a lot less fun than making one with hundreds of SPDT relays !

  Are you sure? yes | no

Bartosz wrote 11/14/2017 at 14:13 point

Question is price and posibility to buy

  Are you sure? yes | no

Yann Guidon / YGDES wrote 11/14/2017 at 16:08 point

@Bartosz : what do you want to buy ?

If you can simulate and/or synthesise VHDL, the source code is being developed and available for free, though I can't support all FPGA vendors.

If you want a ready-made FPGA board, that could be made too.

If you want relays, it's a bit more tricky ;-)

I have just enough RES15 to make my project and it might take a long while to succeed. There will be many PCB and other stuff.

However if, in the end, I see strong interest from potential buyers, I might make a cost-reduced version with easily-found minirelays. I don't remember well but the Chinese models I found cost around 1/2$ a piece. Factor in PCB and other costs and you get a very rough price estimate... It's not cheap, it's not power efficient, it's slow and won't compute useful stuff... But it certainly can make a crazy nice interactive display, when coupled with flip dots :-D

So the answer is : "it depends" :-D

  Are you sure? yes | no

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates