03/14/2020 at 04:50 •
My efforts with Pushing more bubbles, now the carry-lookahead adder were promising but a bug somewhere has made it vain. So I restarted from scratch instead of trying to dig too deep in my own code. That's how I came with the results of Bitslice and now, I have a big advantage : I can have arbitrary polarity of the input and output of the CLA logic and I have more freedom to choose the gates.
Also this time I should make more progressive alterations to the design to catch errors earlier. Like : I should test it bit by bit and build the exhaustive test at the same time :-)
All the G and P inputs benefit from the inverter (which is in fact the output of the NOR2 or NAND2 of the ROP2 bitslice) and they all have a fan-in of 1 so the NOR2 and NAND2 only need a fanout of 2.
The other signals have a fan-in of 2, 2, 3, 1, 2, 0, which is reasonable. Cin has a serious fanout but is ready much earlier so it's not critical.
For P2(0) I replaced the AND3 with a NOR3. This provides the signal earlier than G2(0) because the inverted inputs arrive one inverter earlier (and there is only one driver layer).
The second block of the CLA is quite similar (which is not surprising since it is more or less copied from the LSB part).
The same recipe is applied. The AND-OR is replaced by NAND-NAND, AND3 is replaced by NOR3.
I had to insert two inverters, one on G2(1) because it's used on the inverting input of 2 blocks, so I renamed the signal /G2(1).
All the inverted input signals are used, and only once, just like before.
The last bits and the carry output are pretty similar:
Note how each G and P input is used, only once for each polarity.
The gates on the left have some freedom for re-interpretation.
The carry output is XORed but there is no penalty because the other outputs are XORed as well but in the ROP2 and SUXEN level.
The new source is there : https://cdn.hackaday.io/files/272801167147520/CLA8_NAND.cjs
03/13/2020 at 00:24 •
The ROP2 and ALU part has been slowly expanding to the SUXEN but the log 70. The nexus reminds that something is missing : I have forgotten to include PC+1 (NPC) value. So another level of MUX is required, which is fortunate because I had also left the SHL result. I can then use another full MUX3.
In a previous log ROP2 with Falstad I came up with this diagram:
and it seems it must be extended a bit with another layer of MUX3 (source)
The CDP of the whole stack is about 10 simple gates and I have not counted the CLA or the IN port.
It's pretty satisfying to see that whole "datapath" in one picture, at last :-)
You can see a lonely inverter on the /X signal. This is an optional correction for the output polarity of the CLA. It can be omitted if needed, it's on the "slow path" and provides some degrees of freedom for the CLA design.
Speaking of slow paths : there is one OR just before the ROP2_out signal but it looks uncompressible and not critical so I leave it here. The input XOR for SND is critical though. I'll see how I can reduce the output XOR from CLA, there is a fun trick to play with BJT ("enable" by playing with the CLA_EN signal tied to the pull-up resistor of the interlocked pair).
Total gates : 20, 13 NANDx 2 XOR, 3 INV, 1 OR, 1 NOR.
That's 160 gates for the 8-bits datapath (ignoring the CLA and SH circuits). It looks pretty easy to layout and route but the output and the inputs will be located at the same side to ease routing of the register set. I'll probably move to a 3-tiles high organisation for FPGA & ASIC.
I might have found a trick to save a bit of stuff somewhere :-)
The idea is to combine NEG and PASS_EN at the XOR input level, which saves one NAND2 and reduces the NAND3 to NAND2 (which can also add one more input in the datapath if needed). There is a need however to get /L somehow/somewhere but a XOR contains 2 inverters anyway. But NEG and PASS_EN can be controlled at the decoder level and the other inverter is moved/shared.
Oh and I also replaced the OR (for AND_EN) with a NAND2, one input has an inverter while the other input can be inverted at the decode level. (source)
PASS_EN is renamed to PASS_SND because it makes more sense.
I have also added the Zero detection. That would be a OR8 (in a way or another).
Here we see the signal going from SRI to the Result output, NEG and PASS_SND are disabled so the value flows through the OR logic : OrXor_en, ROP2_en, MX_en are enabled.
In the decoder's logic, AND_EN, NEG and PASS_SND are affected. This removes several "don't care" situations.
NEG PASS_SND AND_en XOR_en OrXor_en ROP_en CLA_en IN_en MX_en OR 0 1 0 0 1 1 0 0 1 XOR 0 1 0 1 1 1 0 0 1 AND 0 1 1 1 0 1 0 0 1 ANDN 1 0 1 1 0 1 0 0 1 SUB 1 0 x 1 1 0 1 0 0 ADD 0 1 x 1 1 0 1 0 0 PASS 0 0 x x 1 1 0 0 1 IN x x x x x x 0 1 0 clear x x x x x x 0 0 0
03/09/2020 at 03:15 •
Reader warning : this log/post touches the fundamental things that make me the most passionate about digital design and architecture. Playing with TTL chips, relays and transistors is a fun game but here you have a glimpse of some damned serious matters. This log justifies several aspects of my design choices so strap you belt and learn a few things.
I aim at building the YGREC8 with various technologies (mainly for fun and giggles) but with the same ISA, so the different implementations can execute the same programs, as well as with the same structure and even the same gatelist (except for the relays version). This means that I focus on the manual synthesis of the design and I break down every function down into individual gates. I choose the lowest common denominator for the chosen technologies and then I reuse the same gatelist without trying to overoptimise too much for each target...
This means I must also choose the right structures and keep them (except for the relays). For example, the ALU will be (mostly) identical, with the same CLA because I don't want to re-engineer the system for every new implementation.
The ProASIC3 and the relay versions favour the MUX2 as the atomic, do-everything gate but I intend to use ASIC/CMOS as well as bipolar discrete transistors, which require simpler gates.
- Bipolar gates really prefer the NAND function. It's really the simplest, so it should be the fastest...
- CMOS ASIC technology loves both NAND and NOR (they are symmetrical) but they have a practical limit for the number of inputs. Apparently 3 is a compromise between size and speed because more inputs would put too many pass transistors in series, which would slow down the gate, or force the channel to be too large to compensate (and increase capacitance).
So the "preferred gates" are NAND2 and NAND3.
Others like latches, NOR, INV and XOR are accepted where needed. For example I have studied the structure of the latches and XOR in several previous logs on other projects (for example the XOR zoo)
Having more inputs to the NAND would be a big benefit to reduce the size and increase the overall speed:
- This lets MUX have more inputs and fewer levels, which is better
- The carry lookahead (and incrementer) can have a coarser granularity, fewer levels and a shorter critical datapath
- and I probably forget a few other units, SHL would be a good candidate as well.
Bipolar discrete circuits can have many inputs, 4 would not be a concern, maybe 8 is possible before running into signal integrity issues. The question is: is it a good choice for CMOS ?
I have recently received an answer from @Staf Verhaegen :
"Multi-input cells are mainly power and area optimization and not performance. Area optimization is trivial due to reduced number of transistors; power optimization is due to the removal of internal switching nodes.
I haven't looked deeply in maximum number of series transistors in a design but typically one does not go above four. Going more would need big transistors and likely not that much would be used by synthesis anyway."
Thank you for the context expansion :-)
Let's see how/why this is so.
CMOS obeys to a few rules, in particular t=RC so the goal is always to minimise resistance and capacitance.
Capacitance comes from the gates regions because the thin area where the poly overlaps diffusion creates a capacitor. The smaller the gate, the faster.
However resistance comes from the relative width of the area through which current flows. The smaller the section, the less current flows, so the width must be maximised to make fast circuits.
So there is this basic compromise : if you make a transistor wider you increase the current hence the speed but this also increases the capacitance, which reduces the speed...
And this is for one transistor. CMOS gates need transistors in parallel and series ! And the more inputs, the more in series, and the more resistance, which also reduces the speed...
- 2-input gates are a bare theoretical minimum for making a circuit. You won't go far with 1-input gates.
- 3-inputs gates are an extension of the 2-inputs version. Just enlarge the gates a bit to keep speed in check.
- 4-input gates can concentrate more values but need 2× larger gates than the 2-inputs version to keep the speed. But 2× larger gates also means 2× RC delay, 2× the drive strength and/or 2× the propagation time. A compromise is required such as 1.5× width for about 1.5× increase of delay but this also requires spraying the chip with more buffers/inverters to boost the signals anyway.
conclusion 4-input first-kind gates (NAND4 and NOR4) are good for reducing the power and area (hence costs) but don't help with performance in CMOS. That would however be an interesting option for a power&cost-enhanced version, where NAND4 is preferred every time it's possible. And given today's technology, that power reduction could have great benefits, as AMD found with their latest CPU generations : reducing the power allows you to pack more cores than the loss of individual performance.
Of course it's still possible to plug some 4-input gates at some critical places to "get things done". For example the current SUXEN has a MUX3 and doesn't get the SHL unit. Adding a NAND4 would add the SHL easily into the result bus. Upon closer inspection, PC+1 need to be sent as well so another layer of MUX3 would still work well...
I find AND4 and NAND4 in some standard cell libraries. For example, the SXLIB has NAND3 and NAND4 gates :
The size bloats however when a higher output driver strength is required, with 2 inverters...
However, increasing the number of inputs also increases the density of the control wires and it can increase the difficulty of routing. It's all a matter of balance...
Looking at the above cells I notice that there is quite a significant area that is not used by the diffusion. This could save 20 or 30% of total die area if it was trimmed, with corresponding savings in costs and probably speed. This area is used by more complex gates that need more internal logic layers (XOR, DFF...). So I wonder if/how it is possible to make a "reduced count" gate library with only "low-profile" gates... A sort of "RISC" method applied to CMOS ? :-D
03/07/2020 at 01:12 •
Note: this log is obsoleted by Bitslice
After the last log Adder with Falstad, I also converted the ROP2 bitslice to the interactive simulator :
There is still the challenge to disable the output of the CLA so it can be combined by ORs with the rest of the units.
My guess was to decompose the end XOR and include the "enable" signal at the end of the combining gate but I get the wrong polarity. Instead I have this solution for XA1:
however the fanout for the "enable" signal is doubled and this feeds 2 gates with 3 inputs, which uses more space...
It's hard to reduce the XOR gate and each technology has their own tricks up their sleeve to implement it, so I prefer to keep the XOR as is and the output is MUXed with a classic 2-levels NAND circuit:
The truth table is updated :
NEG PASS_en AND_en XOR_en OrXor_en ROP_en CLA_en IN_en /OR 0 x x 0 1 1 0 0 \OR 0 x 0 x 1 1 0 0 XOR 0 0 1 1 1 1 0 0 AND 0 0 0 1 0 1 0 0 ANDN 1 0 0 1 0 1 0 0 SUB 1 x x 1 1 0 1 0 ADD 0 x x 1 1 0 1 0 PASS x 1 x x 0 1 0 0 IN x x x x x 0 0 1 clear x 0 0 0 0 0 0 0
Simulation with Falstad helped uncover some non-trivial "don't care" states.
The PASS code is back to "ROP2 land" but this shouldn't create too many problems in the decoder.
I added the IN instruction and left the SH codes for a future version, so it fits with the final NAND3 gate.
Some signals such as AND_en are updated because the ROP2_en "shadows" them.
It seems XOR_en and AND_en could have their names swapped ? AND_en is 1 only for XOR, and XOR_en is 0 only for OR...
03/05/2020 at 23:15 •
So Falstad is a reasonably potent logic simulator that lets me input schematics easily and test them.
I had hit a bug in the ALU8's CLA when porting the #VHDL library for gate-level verification back to #YGREC8 and went to other sides of the project, because I didn't feel the energy to go back through all the optimisations I made. But thanks to Falstad I can do it interactively...
So I went back to the main diagram and rebuilt the whole thing in Falstad's simulator:
The source code is so large I can't add it as a link so it's in the file Add8.cjs.
Slowly, little by little, I can resume the "bubble pushing" that created the nasty bug, but this time I can avoid it :-)
03/05/2020 at 11:37 •
You must know Falstad's circuitjs simulator, and I've been using it for some weeks for analogue designs. It's not perfect, I have found quite a few quirks, but this is also a logic simulator, not a highly powerful but capable and interactive one !
It took little time to draw the schematic of the INC8 unit and now I wonder why I wasted so much time doing it with Dia when I could also simulate the result and provide the source code (click here !)
With a few clicks I was even able to see where I made a mistake in the wiring.
I have stumbled upon a roadblock with the ALU8 and was feeling lazy to make a deep analysis of my mistakes, Falstad's circuitjs looks like the handy solution to that :-)
Now, is it possible to convert the netlist (extracted by my new tool) to display it with Falstad ? Or vice versa ?
01/30/2020 at 20:35 •
It strikes me only now that I must have under-estimated the importance of magnetic interferences between relays...
I remember seeing placement recommendations for miniature Chinese relays but the РЭC-64 has a tubular shield. How do the openings at both ends behave ?
Having two relays on a well-spaced prototype breadboard can't show the effects of many relays packed densely and switching with weird patterns.
I'm starting to consider using mu-metal sheets but I wonder if it's practically effective and the right solution, because I still have some freedom to organise the parts in space and optimise the magnetic field...
Shoud I start playing with, or even build, a flux-meter ?
Here is the seller for the RES-64 :
It's a SPST reed relay, you have the two contacts going out of the glass tube available at the opposite ends. The glass tube would be surrounded by the electromagnet coil, the whole is inserted in the metallic tube to further direct the magnetic field and shield a bit from outside influence. I still have to examine a non-working piece to confirm. An additional pin connects the "case" to ground (for example).
In the expected configuration, the relays will be paired and receive the same current, except during a set/reset pulse.
I'll have to check and measure the magnetic field at the ends of the relays. That's one excellent reason to finally use all those UGN3503 I bought for another project !
In the end, working in pairs might solve the problems I imagine so far.
One way to see it is with both relays forming a magnetic loop, to close the static field. I'll just have to find a way to loop the magnetic field, for example by cutting a torus in half. This ensures that the pair of relays is closely coupled, little energy will leak to the closest neighbours.
However, I suspect that the real problem is not the static field but the pulsed/forced changes when a capacitor discharges. This is what can affect sensitive neighbour relays but there is a catch : one relay is pulsed with the opposite polarity of the twin relay... In ALL cases, the magnetic pulse will go against the static field of one relay, while also reinforcing the field of the other relay. There must be an opposition of fields somewhere, a magnetic "hot spot" that can interfere with the nearby relays.
Ideally the programming pulse should have the same polarity for both relays. First it would prevent/limit the cases where one relay has a state different from its twin, in particular during power-on. Second : it would allow the magnetic pulse to be "looped" in a closed magnetic circuit, thus removing many causes of magnetic leakage and interference. The problem is that it would easily double the power drawn by the register set, since there would be 6V to be dropped in resistors... The whole register set would dissipate 2W instead of 1W.
Yes this log needs more drawings...
I just found new information in a totally awesome book dedicated to relays !
The book is "Electric Relays Principles and Applications" by Vladimir Gurevich and it covers occidental as well as soviet relays. A truly fascinating encyclopedia that turns an apparently dumb device into a marvel of engineering !!!
Notice the element n°5 : what is a ferroelastic disk ? anyway it might prevent the magnetic field from escaping from one end, which is also great to reduce interferences from neighbouring switching relays...
01/24/2020 at 16:53 •
The relay-based version of the YGREC-8 was in limbo due to delays in the delivery of required parts. I'm expecting more RES-64 to arrive in a few weeks, after ridiculous back-and-forth between post offices on strike. Meanwhile I was able to progress with #VHDL library for gate-level verification in amazing ways but... My soldering iron is asking for action !
Fortunately I received other parts from Russia (thank you eBay !) and I'm listing them to keep track of their intended use.
Those parts are pretty oversized, compared to today's technology, but the looks/appearance/style is worth it and the whole will be coherent ;-)
20× PETP K73-16 63V 2.2µF
These are non-polarised capacitors with medium value.
They are useful for 2 cases :
- for CCPBRL: the coupling between stages requires a capacitor but a polarised one forces the use of two shifted power supply domains. Non-polarised capacitors simplify the power supply design, as well as logic design in some corner cases. However, 2.2µF might not cut it for the RES-15...
- for the high-fanout buffers such as the ones described below:Some simulations with Falstad have shown that a high value would create an oscillation thatcould interfere with the rest of the circuit. A low value however wouldn't transfer enough energy from one side to the other. In both cases, the purpose is to prevent arcing at the contacts of the control relay at the bottom of the drawing.
I don't think the YGREC8 needs 20 high-fanout signals but at least I'll be ready. The data memory system requires 5 buffers, the instruction memory might need a few more, but it is a reasonable approximation.
Of course I'll have to experiment, test, verify, measure... I expect to make another video when it's done :-)
Verdict : great surprise !
These capacitors aged very well and maintained excellent isolation as well as precise capacitance : +/- 2% worst case ! I don't know about the inductance but it should work very well.
20× Inductances Kig 0.1 1000μH
High-value, low-current inductors
These parts will "isolate" the various bitplanes from the main power supply.
Each of the 8 bitplanes contains at least 16 RES-64 to store the values from the register set, but these planes are quite sensitive to external interferences. A "pi" network is used : each bitplane has a local large-value capacitor, added to the large value of the power supply, and the bitplanes can emit and receive pulses that could flip other states...
The current rating is low but compatible with a single bitplane : each relay uses approx. 2.5ma, a total of 20mA, this gives a 5× margin with this 100mA part.
Verdict : good !
8.5-9.1 ohms is a bit much, but the inductance is around 960µH, a few percents of variation.
It should work well...
8x Capacitor K50-24 16 V 2200µF
high-value, medium-voltage power-storage capacitor
There are 8 of them, just as needed for the 8 bitplanes. Ideally they filter the power to the 16×RES-64. However due to the high capacity and the low current rating for the inductor, there is the risk of blowing up the inductor in the case where the input is (accidentally) shorted. A series germanium diode (or 2 in parallel for higher current capacity and longevity) would prevent the damage.
These parts aged but should be "good enough", with a self-resistance around 1M and average capacitance in the 1600-1800µF range. ESR might be high though. They can be for local power supply filtering.
I have "reformed" the capacitors through the slow and long application of current and the leakage has been significantly reduced.
I applied this method to the larger caps below as well.
8× K50-16 25 V 5000µF
These big babies belong obviously in the power supply.
Not much more to say. It's going to be quite massive but I want to avoid as much ripple as possible.
There will be several voltages so at least a couple will be used for each rail, depending on the needed current.
something bad happened, probably a breach because there is liquid in the bag. Probably the electrolyte...
I'll have to get others and/or use my "modern" stock.
40× K50-20 100 V 10uF
Those are temporary bit storage caps for the Flip-Flops. 4.7µF would work too.
The value would be charged through a resistor (TBD) to reduce interference and spikes on the supply rail. The capacitor is discharged in the middle point of the two RES-64 in series, acting as inductors, so there is current reduction to add on that side.
Niobium is an alternative to tantalum, so the ESR is much lower than the electrolytic capacitors and it should be able "kick" the current pretty fast and counter the reverse EMF.
BTW : the + pole is at the black end (thanks @Artem Kashkanov !)
I need 8 of them for the write-back latch that feeds the write port of the register set, but other capacitors are required for the other Flip-Flops in the system, such as the PC counter, even counters, ...
Some parts seem to have a bit of gunk but appear functional.
Capacitance is good, ranging from 10 to 11.7µF.
Resistance/leakage is easily measurable (>1M) but good enough to work for short pulses.
Extra/leftover parts will reinforce the filtering of the power supplies.
I also have received 30× axial tantalum K53-14 10µF 10V capacitors. The average capacitance is measured at 9µF (+/- 0.3µF) and it's smaller so more practical for the latches.
20× Niobium K53-4 20 V 47µF
Clearly useful to complement the large, high-ESR electrolytic capacitors for filtering the power supplies here and there (for the low and medium voltages up to around 12V). Two in series are mandated for filtering the 24V rail, made of a symmetric +12V/0V/-12V supply.
BTW : the + pole is at the black end (thanks @Artem Kashkanov !)
These parts aged and even if one is 60µF, many others are under 47µF, in the 37-43µF range.
There is some tiny current leakage but these should be good enough, though.
So that's it.
It's going to be a bold, large, bloated but exquisitely exotic board...
01/12/2020 at 21:07 •
I'm considering writing a quick&dirty&short behavioural simulation of the Y8 core to get a refreshed higher level view (as well as an alternate platform that simulates faster to try code out). I've already written one 2 years ago but it left a lot to be desired. Today's core has a more refined ISA and many details have evolved and matured. And I have a cool assembler I can throw at more challenges!
The core is still simple, anyway, with "most instructions executing in a single cycle", with 2 exceptions :
- Write to PC (except SET and CALL that can write directly to PC and bypass the slow ALU path)
- LDCL/LDCH that need one more cycle to read the instruction memory.
and then you realise you haven't thought about LDCx writing to PC, which makes things even more complex but who would do it ? (well, if 512 instructions of the 64K space allow it, there is 1 chance in 128 it gets executed so it's not negligible).
A proper FSM is clearly required. And soon, it appears that it's not going to be a nice and tidy FSM like you see in tutorials because I need 2 FSMs.
- One FSM handles the instructions and the special cases : stop/inst/LDCx/WritePC.
- Another FSM handles the execution state: start/step/stop/(re)set/load
These can't be completely joined because they are orthogonal. The execution FSM is what is visible from the outside, particularly during debug sessions. It must also handle internal initialisation when the chip goes out of /RESET (like: load data from external memory). The instruction cycle FSM cares for sequencing single and complex instructions.
So it makes sense to separate the 2 FSM because it untangles the complex combinations that might arise and prevents naughty bugs and race conditions.
Before going further, let's remind how the Y8 works : there is no real pipeline, yet 2 phases overlap:
- A first phase computes the next address (Increment PC) and/or gets a new address from an instruction, then fetches the instruction memory to read the next instruction.
- The second phase gets a ready instruction (or immediate word), then decodes/fetches operands/executes/writesback stuff.
The FSM must ensure that the instruction memory is fetched before starting the execution phase.
It gets more complex with the LDCx instructions because another round goes through the instruction memory. And if the destination is PC, yet another cycle is added to fetch the instruction instead of using PC+1.
OTOH, the core state FSM has to receive orders from outside, latch them and send the appropriate commands to the other FSM. Reset is asserted there, since no "general RESET" signal would be propagated through the whole chip, further simplifying the implementation.
This FSM receives a code from outside and latches it before interpretation. During debug, it deals with the usual start/step/stop commands, though it defaults to start after power-up. A fourth command can be RESET to force a reload of the program memory (if you change the source code to debug or if INV wants to switch to a different code page).
So we get the following command codes :
Command code Description Start 11 Run the system Step 10 execute one instruction then pause the core. Stop 01 pause the core at the end of the instruction. Reset 00 pause the core and reset FSM
(reload instructions when another
command is received)
The default command is Start when the core wakes up, and it can be modified (from input pins or through the debug interface).
The FSM can have more states to deal with initialisation. It is clocked by (and synchronised to) the core clock. Its value should be read back by the user after every command to ensure the proper value is latched and the FSM is in a coherent state.
The five steps (so far) are:
State Description How to get there Reset Just do nothing.
Clear PC and some
eventual SR flags.
receive the RESET command
or external signal.
Load copy external data to
increment PC for each
ext. Reset signal is
de-asserted and receive
command (other than Reset)
Stop wait, do nothing. - Load is finished (PC overflow)
- received Stop when in Run
- when in state Step
Step Execute one instruction - received Step command Run Let the core do its thing - received Start command
when in Stop state
I hope it's clear.
The instruction FSM has 4 states:
State Description how to get there IDLE Do nothing. - opcode=INV in state INST
- state/= Step and state /= Run
INST decode and execute instruction - INST
- LDCX & SND /= PC
- IDLE & State=RUN or STEP
LDCX cycle to read instruction memory state=INST and opcode=LDCH/LDCL WrPC extra cycle to fetch instruction state=INST or LDCX and SND=PC and opcode/=SET|CALL
These states have different names to prevent confusion between both FSMs.
From there, the code is pretty easy to write...
Note : the split FSM makes them easier to run fast and less prone to bugs.
The size for storing the states is marginally larger, but the decoding logic probably smaller.
The significant difference is the latency and delay : there is about two clock cycles of delay between the arrival of a new command and the start of the execution of an instruction. This is not a severe problem because the debug interface will be significantly slower than the core.
Crude simulations shows the "Phase FSM" has a flaw that allows a transition from IDLE to INST when the opcode is INV. This condition should put the FSM into a different state that is exited when a new command is received... I'm updating the code & sketches.
This system will now clearly refuse to execute any INV instruction, even in debug mode.
01/02/2020 at 04:05 •
The ALU needs a new decoder because I changed the logic of the ROP2 unit... With a working decoder, I can re-implement the fault checker and validate all the opcodes.
20200103 : I updated files here and there, as well as the main page that contains the "official definitions", because I have changed the order of the boolean opcodes.
I must make a new lookup table that is more thorough than the one at Bubble-pushing the ROP2 as well as the previous versions. There is a big difference, this time I use OR instead of MUX2 to combine the data so there are fewer degenerate cases.
Func NEG PASS_en AND_en XOR_en OrXor_en CLA_en CMPS WB CryWr OR 0000 0 0 0 0 1 0 x 1 0 XOR 0001 0 0 1 1 1 0 x 1 0 AND 0010 0 0 0 1 0 0 x 1 0 ANDN 0011 1 0 0 1 0 0 x 1 0 CMPU 0100 1 0 1 x 0 1 0 0 1 CMPS 0101 1 0 1 x 0 1 1 0 1 SUB 0110 1 0 1 x 0 1 0 1 1 ADD 0111 0 0 1 x 0 1 0 1 1 SET 1000 x 1 1 x 0 pass 0 x 1 0 CALL 1001 x 1 1 x 0 pass 0 x 1 0 SH/SA 1010 x 0 0 0 0 clear 0 x 1 0 RO/RC 1011 x 0 0 0 0 clear 0 x 1 b11 (RC) LDCL/H 1100 x 1 1 x 0 pass 0 x 1 0 IN 1101 x 0 0 0 0 clear 0 x 1 0 OUT 1110 x 0 0 0 0 clear 0 x 0 0 INV 1111 x 0 0 0 0 clear 0 x 0 0
Some equations :
CLA_en = F2 & /F3 ROP2 = /F3 & /F2 F0F1 = F0 | F1 /F1F0 = /F1 & F0 /F1F3 = F3 & /F1 CMPS = CLA_en & /F1F0 CarryWrite = CLA_en | opcode=RC RegisterWriteback = /F2 | /F1F3 | (/F3 & F1) OrXor_en = NOR3( F1, F2, F3) XOR_en = F0F1 & ROP2 NEG = ( ROP2 & F1 & F0) | (CLA_en & F1F0) Cin = NEG (more or less but also need to decode ADD 0 cond) PASS_en = /F1F3 & /(F2 & F0) AND_en = PASS_en | (/F3 & /F1F0) | CLA_en
Maybe a 16×9 bits ROM would be better...