Close
0%
0%

YGREC8

A byte-wide stripped-down version of the YGREC16 architecture

Similar projects worth following
YGREC can stand for many things, such as "YG's Relay Electric Computer", "Yann's Germanium and Relay Equipped Computers" or "YG's Ridiculous Electronic Contraption". You decide !

#YGREC16 is getting pretty large and moving away from the original #AMBAP inspiration, making it less likely to be implemented within my lifetime. So here is a "back to minimalism" version with
* 256 bytes of Data RAM (plus parity ?)
* 8 registers, 8 bits each
* fewer relays/gates than the YGREC16
This core is so simple that I focus now on the debug/test access port and the register set's structure.
Like the others, it's suitable for implementation with relays, transistors, SSI TTL, FPGA, ASIC, you name it!

I give up on the idea of playing the Game of Life (the forte of #YGREC-РЭС15-bis) but I design a VHDL version because @llo sees the YGREC8 as a perfect replacement for PICs for his #SteamBot Willie !

A significant reduction of the register set's size is required so I/O must be managed differently, through specific instructions. The register map is expected to be:

  • D1  <= for NOP
  • A1
  • D2
  • A2
  • R1
  • R2
  • R3
  • PC  <= for INV

I shrunk the instruction word down to 16 bits. It is still reminiscent of the YGREC16 older brother but I had to make clear cuts... The YGREC8 is a 1R1W machine (like x86) instead of the RISCy YGREC16, to remove one field. Speed should be great, with a pretty short critical datapath, and all the instructions execute in one clock cycle (except the LDCx instructions and computed writes to PC).

The fields have evolved with time (I have tried various locations and sizes). For example:

20171116: The latest evolution of the instruction format has added a 9-bits immediate field address for the I/O instructions.
20180112: Imm9 is now removed again...
20181024: changed the names of some fields
20181101: modified the conditions to change Imm3 into Imm4
20180112: Imm9 back again !

There are 18 useful opcodes (plus INV, HLT and NOP), and most share two instruction forms : either an IMM8 field, or a source & condition field. The source field can be a register or a short immediate field (4 bits only but essential for conditional short jumps or increments/decrements).

The main opcode field has 4 bits and the following values:

Logic group :

  • AND
  • OR
  • XOR
  • ANDN

Arithmetic group:

  • CMPU
  • CMPS
  • SUB
  • ADD

Beware : There is no point to ADD 0, so ADD with short immediate (Imm4) will skip the value 0 and the range is now from -8 to -1 and +1 to +8. (see 17. Basic assembly programming idioms)

Shift group (optional)

  • SH/SA direction is sign of shift, I/R(bit9) is Logic/Arithmetic flag.
  • RO/RC direction is sign of shift, I/R(bit 9) allows carry to be rotated.

Control group:

The COND field has 3 bits (for Imm4) or 4 bits, more than YGREC16, so we can add more direct binary input signals. CALL is moved to the opcodes so one more code is available. All conditions can be negated so we have :

  • Always
  • Z (Zero, all bits cleared)
  • C (Carry)
  • S (Sign, MSB)
  • B0, B1, B2, B3 (input signals, for register-register form)

Instruction code 0000h should map to NOP, and the NEVER condition, hence ALWAYS is coded as 1.

Instruction code FFFFh should map to INV, which traps or reboots the CPU (through the overlay mechanism): condition is implicitly ALWAYS because it's a IMM8 format.

Overall, it's still orthogonal and very simple to decode, despite the added complexity of dealing with 1R1W code.


Logs:
1. Honey, I forgot the MOV
2. Small progress
3. Breakpoints !
4. The YGREC debug system
5. YGREC in VHDL, ALU redesign
6. ALU in VHDL, day 2
7. Programming the YGREC8
8. And a shifter, and a register set...
9. I/O registers
10. Timer(s)
11. Structure update
12. Instruction cycle counter
13. First synthesis
14. Coloration syntaxique pour Nano
15. Assembly language and syntax
16. Inspect and control the core
17. Basic assembly programming idioms
18. Constant tables in program space
19. Trap/Interrupt vector table
20. Automated upload of overlays into program memory
21. Making room for another instruction
22. Opcode map
23. Sequencing the core
24. Synchronous Serial Debugging
25. MUX trees
26. Flags, PC and IO ports
27. Binary translation (updated)
28. Even better register set
29. A better relay-based MUX64
30. Register set again
31. Rename that opcode !
32. Register set again again
33. Yet Another Fork
34. What can it run ?
35. More register set layout
36. More VHDL and more gates
37. R7 P&R
38. Program Counter and other considerations
39. Bus names (SRC-SRI, DST/SND)
40. Now faster without the "PC-swap" MUX
41. A diode-less...

Read more »

YGREC8_VHDL.20190422.3.tgz

R7 decoder in A3P tiles

x-compressed-tar - 174.65 kB - 04/22/2019 at 01:39

Download

YGREC8_VHDL.20190422.tgz

a better decoder for the register set

x-compressed-tar - 173.01 kB - 04/21/2019 at 17:58

Download

YGREC8_VHDL.20190421.tgz

Redesigning the register set

x-compressed-tar - 155.92 kB - 04/21/2019 at 00:13

Download

YGREC8_VHDL.20190412.tgz

New gates library, better ALU and decoder

x-compressed-tar - 152.59 kB - 04/11/2019 at 23:23

Download

YGREC8_VHDL.20190404.tgz

more versions of the ALU8 decoder

x-compressed-tar - 148.85 kB - 04/04/2019 at 23:10

Download

View all 31 files

  • Census of the gates

    Yann Guidon / YGDES3 days ago 0 comments

    It's time for a little census.

    [yg@Host-002 VHDL]$ grep -r 'entity' * |grep 'port' |grep 'map' |sed 's/.*entity //'|sed 's/ port.*//'|sort|uniq 
    AND2  
    AND2A
    AND3
    AND3A 
    AO1
    AX1C  
    CLA3
    INV   
    MX2
    NAND2 
    NAND3 
    NAND3A
    NOR2  
    NOR3  
    NOR3A 
    OA1A
    OR3 
    XA1
    XO1
    XOR2  

     (I removed the complex unit names by hand)

    There are 20 gates so far, more, and more complex, than what #Shared Silicon provides (only INV, NOR, NOR3, NAND2, NAND3 and some T-gates).

    I believe that by using more complex gates with more inputs (but reasonably so), there is a bit of performance and size benefit. I don't see any roadblock to get the missing gates : either I can make mine easily, or I borrow from existing free libraries.

  • How to divide the register set's power consumption by about 5

    Yann Guidon / YGDES3 days ago 0 comments

    The latest source code archive contains the enhanced decoder for the register set, including 3 strategies:

    • Straight (fast)
    • update only meaningful control lines
    • update only meaningful control lines when the related field is used

    I provide a pseudo-randomised test to compare these strategies and the outcome is great:

    [yg@Host-001 R7]$ ./test.sh 
    Testing R7:
      straight decoder:R7_tb_dec.vhdl:165:5:(report note): 100000 iterations, 702273 toggles
      latching decoder:R7_tb_dec.vhdl:165:5:(report note): 100000 iterations, 301068 toggles
      Instr-sensitive :R7_tb_dec.vhdl:165:5:(report note): 100000 iterations, 160231 toggles
    R7: OK
    

    There is a ratio of approx. 1/5 between the first and third result, which I explain below :

    • Given that the probability of one bit being set is pretty close to 1/2, it makes sense that the first "straight" decoder toggles the output bits every other time in average. There are 14 control lines to drive and with a 1/2 probability, 7 lines change.
    • The next method gives a better result, that you can understand using similar logic : we get 3 toggles per instruction, which makes total sense. There are 2 decoders but only 1/2 chance of change, so we can focus on one decoder. Each decoder updates only 3 of the 7 control lines because the other 4 give results that will not be used. So far, so good, no surprise at all.
    • The last method gives an average toggle rate of 1.6 per instruction. This is one half of the previous result and though it should be taken with a lot of precaution, the benefit is clear. Some instructions (about 1/4) don't use the SND field, and the SRI field is not used when Imm8 or Imm4 fields are used, giving a further significant reduction of toggles.

    Of course, these numbers are NOT representative of real use cases. I used pretty uncorrelated bits as sources, while real workloads have some sorts of patterns. The numbers will certainly increase or decrease, depending on each program.

    There is a compromise for each situation and the 3 methods are provided in the source code, so you can choose the best trade-off between latency and consumption. The numbers are pretty good and I think I reached the point of diminishing return. Any "enhancement" will increase the logic complexity with insignificant gains...

  • A little note

    Yann Guidon / YGDES4 days ago 0 comments

    I just thought about something else, which is a good argument FOR using a latch or flip-flop to drive certain control signals, in particular MUX2s:

    MUX2 often needs complementary control signals. In A3P and other technologies, each MUX2 receives the control signal and implements an inverter inside the gate. However for ASICs can't always afford this luxury because the inverter would be uselessly duplicated, increasing latency and surface.

    OTOH the latches and flip-flops usually work by looping two inverters back to each other and it's often possible to get the positive as well as the complementary output.

    When signals are well routed, the control signal's latch can also serve as an inverter for free, there are 2× more signals (and double the load) but the logic complexity is reduced and uses fewer transistors. This is especially critical for the register set, to save room...

    So while the FPGA version will use MUX2s for the latch and the register selector, the ASIC version will use a latch with 2 complementary outputs and the MUX2 is reduced to a simple gate that performs (A and B) or (C and D) (with for example B and D as control signals, using only 8 transistors in classic CMOS or 2/4 transistors with pass gates).

  • Control-gating the register set

    Yann Guidon / YGDES5 days ago 0 comments

    The logic strategy has evolved since 61. Making Y8 more energy-efficient with a deglitcher and the first code for the ALU is not yet satisfying, so instead of digging that part even more, let's shift our attention to the register set... A clean, fresh reboot of an old subject could work :-D

    Let's start with a reminder about the structure of the registers: there are 8 bytes with 2 read ports and 1 write port and the write port shares the address with one read port (further saving instruction bits, but not control logic). For now we focus on the decode logic of the read ports, which we then duplicate to get the SND and SRI fields.

    The register set is heterogeneous and uses multiplexers organised as a "balanced control balanced binary tree", seen in the picture below:

    Address bits b0 and b1 are swapped at the middle of the tree to even/balance the load but control gating reduces this constraint (we still use this method because the fanout constraint is not eliminated). Bit #2 at the to of the tree is not affected.

    One bitslice uses two of these fancy "MUX8" to read the two operands:

    Registers 5 and 6 are swapped, this is the only difference to remember. The address bits work almost like a normal MUX8, with the fanout slightly enhanced.

    We are now interested by the enable logic : only 3 of the address bits need to be changed, out of 7, and only if needed. For example there is no reason to change B1B when B2=0.

    So far, so good...


    The register set is probably the most critical thing to decode As Soon As Possible so these simple equations are very convenient. But this is not the end of the story because another class of data can further inhibit the addresses:

    • Some instructions (INV, IN, OUT) don't use the SRI field
    • Other instructions (INV, IN) don't use the SND field
    • The SRI field is not used in IMM8 or IMM4 form

    The last one seems very easy to solve : bits #11 (R/Imm8 flag) and #10 (R/Imm4 flag) must be 0 to select SRI, so the Enable for the whole MUX8 must also AND with not ( Instruction(11) or Instruction(10) ). However this first approximation disables the 6 opcodes that don't use Imm8 (in particular the 3 opcodes that set bit 11: SA, RC, LDCH) and doesn't disable IN, OUT and INV (though only OUT sets bit 11).

    However...

    1. IN, OUT and INV are not expected to be executed frequently enough to save significant power if control-gated. The few extra decoding gates do not a big deal but they add precious latency in the critical datapath.
    2. OTOH it is critical to correctly decode the instructions that don't use the Imm8 field. The bit #11 must be disabled when the opcode is after CALL or (conversely) enabled up to CALL. The last option is the easiest to code, with a 3-inputs gate : not(b15) or (not(b14) and not(b13))

    SND is not used for the SET, CALL, LDCL, LDCH, IN and INV instructions. SET is likely the most used opcode so its gating might benefit the system. It is easily grouped with the neighbour CALL instruction with a 3-input gate (b15 & /b14 & /b13) :

    LDCL, LDCH and IN can be easily added (b15 & b14 & /b13) and b14 is the only difference so the inhibition of reading SND is simply (b15 & /b13).

    .

  • Scheduling (2)

    Yann Guidon / YGDES04/16/2019 at 06:20 0 comments

    More thinking and experimenting happened, as I faced a growing complexity with the decoding logic. There is the "fast" equation, then the "set" and "reset" ones, all must be written, checked, translated, mapped. And these equations are not compatible because there is no way to dynamically change from "turbo" to "eco" mode. The "eco" equations are too large and might still glitch a bit during decoding.

    The conclusion is to simply drop the "set" and "reset" equations. I keep the "fast" equations and add a transparent latch ("T-latch") that gets selectively enabled by a new (single) equation (that indicates the "don't care"s). I see several advantages:

    • Fewer equations to care about and optimise, fewer gates.
    • 3 configurations are possible, where the "fast" equation is used all the time, while the "enable" gates can be discarded if only a "fast" version is implemented. A compromise is also possible, when the "enable" equation is forced to true in "turbo" mode.
    • The "enable" equation can be a bit slower than the "fast" equations, which further reduces the chances of glitches.
    • The T-latch adds little latency when data pass through.
    • T-latches are quite easy to implement, using a multiplexer for example.
    • You can get the VHDL code for the dual-mode version and let the synthesizer trim/prune the gates when you force/stick the "turbo" and/or "phase" signals to 1 (this is easier to configure, test, compare...).

    It looks simpler and more flexible than the previous S/R scheme...

  • Scheduling

    Yann Guidon / YGDES04/13/2019 at 21:24 0 comments

    The Y8 design works nicely with a standard, classic synchronous clock. It was indeed designed this way: each clock cycle would be a new instruction (with a few exceptions). This design is simple though a previous log (61. Making Y8 more energy-efficient with a deglitcher) shows that a more power-efficient (yet slower) method exists, using SR latches. Similar systems exist with transparent latches and the logical conclusion is to use a multi-phases clock.

    In fact, a wide range of choices is available but I have barely scratched the surface, only looking for specific enhancements at specific places. For example I have only latched data at the instruction decoder level but other places would also be good targets : the output of the register set (with the IMM/REG MUX) and the result bus come to mind. Clock gating is nice but data gating is nice too :-)

    A 4-phases clock (with many local gates) sounds like the best approach though it wouldn't account for the individual latencies of the individual stages. In fact, it is very similar to an unbalanced pipeline... This situation already appeared with the YASEP where a configurable pipeline was explored.

    There are already several options for the decoder (either fast or latched) but I have to get ready for more options (and combinations). The clock generator's design will also follow from the chosen options. Fortunately the units themselves are not affected, only the connections between them.


    In the current design iteration of the Y8, the instruction cycle is easily split into about 4 parts.

    • The fetch/decode step : when the main clock goes up, the NPC value is latched by the memory block, the RAM is read and the value is fed to the decoding logic (approx 3 or 4 tiles). Given an access time of about 3ns and 1ns per tile, that first phase is around 7ns on baseline A3P.
    • The operand read/select phase : lookup the register set and mix/select the immediate operand, going through approx. 5 tiles (or 5ns)
    • The "execution" phase is at least 7 tiles deep, given the results of the ALU. Some headroom might be needed.
    • The writeback evaluates the condition, checks the parity/sign/nullity of the result, MUXes the late units (and eventually latches the result). The result is fanout to the register set, which is subject to sample&hold constraints. That would be about 5 tiles long.

    Using the worst case of 7ns per phase, the "minor clock" would be around 140MHz and the instruction cycle 35MHz. This could be bumped to about 45MHz  if the writeback step overlaps the fetch/decode step. This is not very far from the estimated 49MHz of the unoptimised/unpipelined synthesis (13. First synthesis).

    I'd like to offer at least the "simple/fast" and the "phased" options because one promotes speed while the other draws less current but most importantly is easier/smaller to implement in ASIC, since latches use less room that flip-flops (which are really just 2 transparent latches back to back).

    T-latches are typically implemented by Actel with a MUX2 looped back to itself. ASICs prefer pass-gates, which is more or less the same. Sprinkling T-latches all over a chip is an old practice, because

    1. there is less load on the clock network
    2. 2 latches make a DFF
    3. T-latches use less surface and the register set would use 1/2 the surface than with DFF.

    OTOH, DFF come for free with FPGA...

  • GHDL in a docker container

    Yann Guidon / YGDES04/03/2019 at 12:12 0 comments

    Ethan Waldo has run the scripts successfully under Docker :-)

    version: '3'
    services:
      ghdl:
        image: ghdl/ghdl:ubuntu18-llvm-5.0
        volumes:
          - .:/src
        working_dir: /src

    I have no experience with it but I'll add the above configuration file in the README.txt if people want to use containers.

    GHDL has awesome strengths and it shows again :-) and it's only the beginning.

  • Floorplanning

    Yann Guidon / YGDES04/02/2019 at 21:38 0 comments

    So far, here is the floorplan for april, targeting ASIC and FPGA :

    FPGA doesn't need a floorplan (the synthesiser will try to "do something") but I want to try it, at least to 1) validate the intended ASIC 2) see if I can outsmart the synthesiser and reach higher speeds.

    INC8 (increments PC) is done and ALU8 is mostly finished. I try to polish the instruction decoder. Once it's done, I'll easily bring SHL8 back from the older versions, and get a better latency estimate. This complete operating datapath is necessary to create the "final MUX" that creates the RESULT bus.

    Not shown here is the I/O system. The address goes straight from the instruction decoder, using "partial address decoding", to the various units, and the eventual data to read is ORed/MUXed back to the datapath. This will take a lot of time/latency so this MUX comes last. There is no defined IO structure so it's hard to gauge...

    .

    .

  • Making Y8 more energy-efficient with a deglitcher

    Yann Guidon / YGDES03/30/2019 at 18:54 0 comments

    When you create you processor, you want it to work, then priorities shift to speed because the processor must be compared on its merits. Then... it's over, it's too late, the infernal bikeshedding machine is in motion and hard to steer.

    One thing is sure, however : the industry has spent the last 15 years steering toward energy-efficiency, toward more MIPS per Watt and it's often overlooked in the early drafts for amateur designs. We don't care how much our pet machine will consume, since it won't run that long or that much. But efficiency is a staggering goal, even more than performance because today we have so much power at our fingertips. But think about this : if your processor runs 2× slower but uses 3× less energy, you are actually winning and you can put 2 or 3 processors in parallel easily.

    Energy efficiency is not a significant goal for the #YGREC8, nor is performance, or else I wouldn't bother with a relay version. I don't expect the ASIC or FPGA version to run on a battery but you never know. It's never too early to build this into a design that is intended to be the basis for larger designs... And those designs will want to avoid using a heatsink.


    Today, CMOS is the dominant technology and the initial target of my designs, either in FPGA or ASIC (the rest is for the lulz). Its consumption is proportional to three main parameters C-T-V :

    • power supply Voltage (squared) : reducing the voltage is a simple and easy way to reduce the power, to a certain extent at least, because the  out-of-range values will not allow the circuit to run fast (or at all).
    • Capacitance : the gates of the transistors act like capacitors that must be charged and discharged. The capacitance drops as technologies shrink, which we can't control easily, but as a rule of thumb: the more transistors are toggled, the more they will consume. That's an interesting angle of attack...
    • Toggles : the faster you run the circuit, the more charges and discharges, the more energy is needed to switch the transistors. Energy-saving methods often include running the circuits slower.

    Some of those parameters are in the realm of the implementer. For example, the supply voltage or the frequency will not directly affect the logic of the circuit (within the nominal operating conditions of course).

    Some other parameters are directly under the control of the designer, and can be summarised as : toggle less, less often.

    I have mentioned running the circuits slower, but there are other methods as well, some belong to the purely digital design real, others on silicon sorcery, such as playing with a voltage bias with the bulk of the substrate, altering the temperature, changing the doping or composition... Let's forget them.

    What matters at this level is what we can do at the architectural level. We can't control the clock speed in the absolute, but any design choice affects either latency/speed or consumption, usually both. And this is relative.

    One textbook example is "clock gating" : drive the clock input of the group of DFF that really need to be updated. It's easy to do, and easy to mess up, because it's at the edge of timing analysis and FPGA prefer "clock enable" with one main clock.

    Clock gating is finely tuned in very low-power devices. Years ago, I heard of a complex processor being split into more than a hundred of clock domains and I'm sure the numbers are much higher today with Intel's latest processors, where not only clocks, but power domains are fine grained. But this technique is not easily portable.

    The YGREC8 has other places where some "toggle optimisation" can be effective and easier to design, given the proper framework and structures. One such example appears in the ALU where several control signals not only have to be propagated to 8 (or more) logic gates (a fanout of 8 is not negligible) but the results of the changes...

    Read more »

  • A new unit : the decoder

    Yann Guidon / YGDES03/29/2019 at 07:03 0 comments

    I'm starting to update and refine the architecture of the Y8, by making a separate unit, that will hopefully be more flexible.

    Uusally, each unit will receive the required parts of the instruction word and decode the necessary signals. This is very easy with a clean RISC processor, which could be described as a "distributed decoding" architecture.

    In practice, there are other constraints as well. 

    • Structure and layout: the decoding gates often "break" the bitslice-like layout
    • Management of fanout: some bits from the instruction word might need appropriate decoding and buffering with a global scope
    • Sequencing and states: there has to be somewhere to store information about what the processor is doing, for example with the LDCx instructions and the computed writes to PC, which require more than one clock cycle.
    • Power consumption reduction: reduce the current by toggling bits only when necessary.

    That last argument convinced me about this new unit because optimising toggles requires a global perspective that each individual unit can't have. The current instruction word also needs to be latched for the multi-cycles operations (such as LDCx).

    The decoder is a strip of logic gates that propagates bits from the instruction word, parallel to the bitslices, and decodes and spreads control signals in a "fishbone" pattern (perpendicular to the bitslices).

    This increases the complexity of the clock and timing because performance dictates that the units must get their respective instruction bits directly from the program memory.

    • If the instruction must go through a DFF, the value is delayed by one clock cycle and throughput/execution speed is hit...
    • Going through a transparent latch saves time but increases the sensitivity to timing anomalies. A multi-cycles clock becomes necessary.

    One solution though is to store the last instruction and combine it with the new instruction with one or two logic layers at most. It might not work for most control signals and it could generate some spikes, which I'm indeed trying to avoid (because they eat power).

    I'm also considering adding transparent latches at the data inputs of the ALU.

    But before I can add the latches, I have to take the control logic out of the units.


    20190330 : I think I've found the trick, using simple RS latches...

View all 69 project logs

Enjoy this project?

Share

Discussions

castvee8 wrote 04/13/2019 at 22:57 point

I so love your commitment and enthusiasm ! I was playing with vacuum tube calculators a bit since last year an just keep going down the rabbit hole. Your projects seem to at least make purposeful sense.

  Are you sure? yes | no

Yann Guidon / YGDES wrote 04/14/2019 at 08:56 point

That "purposeful sense" may look drowned into the proliferation of projects, angles and ideas but it is still clear to me since it's my main hobby since 1998 at least :-D

I'm glad you enjoy !

  Are you sure? yes | no

Yann Guidon / YGDES wrote 11/04/2018 at 07:11 point

Another note for later :
writing to A1 or A2 starts a fetch from RAM. In theory the latency is the same as instruction memory and one wait state would be introduced. However the processor can also write directly so the wait state would be only on read to the paired data register...

  Are you sure? yes | no

Yann Guidon / YGDES wrote 11/04/2018 at 06:55 point

Note for later : don't forget the transparent latch on the destination register address field, for the (rare) case of LDCx, because the 2nd cycle doesn't preserve the opcode etc.

  Are you sure? yes | no

Yann Guidon / YGDES wrote 11/04/2018 at 07:18 point

OK, not a transparent latch, but a DFF and a mux, plus some logic to control it.

-- DFF, every cycle :

SND_latched <= SND_field;

LDCx_flag <= '1' when (LDCx_flag='0' and opcode=opc_LDC and writeBack_enabled='1')   else '0';

-- MUX2 :

WriteAddress <= SND_latched when LDCx_flag = '1' else SND_field;

______

Note : LDCx into PC must work without wait state because it's connected directly to SRI, as an IMM8, and no extra delay is required. PC wait state is required for ADD/ROP2/SHL and IN.

  Are you sure? yes | no

Frank Buss wrote 10/27/2018 at 12:51 point

Do you really plan 8 byte-wide registers? This would require thousands of relays :-)

  Are you sure? yes | no

Yann Guidon / YGDES wrote 10/27/2018 at 14:26 point

no :-)

8 registers, 8 bits each = 64 storage bits.
1 relay per bit => 64 registers


The trick is to use the hysteretic mode of the relays :-)

  Are you sure? yes | no

Frank Buss wrote 10/27/2018 at 16:17 point

Ok, makes sense. Maybe change the project description, someone might think you are planning a 64 bit architecture.
BTW, could this be parametrized for the address and data size? If you implement it in VHDL, you could use generics for this, would be no additional work to use just the generic names instead of hard coded numbers. Except maybe some work for extending the instruction opcodes.

  Are you sure? yes | no

Yann Guidon / YGDES wrote 10/27/2018 at 17:16 point

Frank : DAMNIT you're right !

I updated the description...

  Are you sure? yes | no

Yann Guidon / YGDES wrote 10/27/2018 at 17:19 point

For the parameterization : it doesn't make sense at this scale. Every fraction of bit counts and must be wisely allocated.

Larger architectures such at #YASEP Yet Another Small Embedded Processor  and #F-CPU  have much more headroom for this.

  Are you sure? yes | no

Bartosz wrote 11/08/2017 at 16:40 point

this will working on epiphany or oHm or other cheap machine?

  Are you sure? yes | no

Yann Guidon / YGDES wrote 11/08/2017 at 18:07 point

I'm preparing a version that would hopefully use less than half of a A3P060 FPGA, which is already the smallest of that family that can reasonably implement a microcontroller.

But it's a lot less fun than making one with hundreds of SPDT relays !

  Are you sure? yes | no

Bartosz wrote 11/14/2017 at 14:13 point

Question is price and posibility to buy

  Are you sure? yes | no

Yann Guidon / YGDES wrote 11/14/2017 at 16:08 point

@Bartosz : what do you want to buy ?

If you can simulate and/or synthesise VHDL, the source code is being developed and available for free, though I can't support all FPGA vendors.

If you want a ready-made FPGA board, that could be made too.

If you want relays, it's a bit more tricky ;-)

I have just enough RES15 to make my project and it might take a long while to succeed. There will be many PCB and other stuff.

However if, in the end, I see strong interest from potential buyers, I might make a cost-reduced version with easily-found minirelays. I don't remember well but the Chinese models I found cost around 1/2$ a piece. Factor in PCB and other costs and you get a very rough price estimate... It's not cheap, it's not power efficient, it's slow and won't compute useful stuff... But it certainly can make a crazy nice interactive display, when coupled with flip dots :-D

So the answer is : "it depends" :-D

  Are you sure? yes | no

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates