09/01/2020 at 03:15 •
This project will soon go in "shelved mode".
Its scope has grown, extended and now goes beyond the PA3 family : I'd like to add the sxlib or the wsclib, among others, and even those included in the Google-funded Skywater PDK (who knows if I can one day get it).
Having a distinct and evocative name helps a lot on many aspects. I struggled to identify or even talk about this project, as you can guess from this project's evasive name. I can now separate the PA3-related code from the gates architecture and the analysis framework, which is now available for any other gates family.
The transition will take a while and the last v2.9 is quite a nice package, but the directories and scripts will be quite different for the new system, far beyond what I considered for v2.10 !
Of course, #Libre Gates will keep the compatibility with the standard proasic3 library. Directory paths are different though but it's not a crazy deal.
08/31/2020 at 03:22 •
It only occurs to me now that the algorithm I use in v2.9 can be way faster and efficient. The trick is to serialise the number of the output port on the respective net and each sink de-serialises it in parallel. The number of runs is then proportional on the log2 of the number of gates, and this can be even further reduced because std_logic has 9 states, and can encode 3 bits per cycle. The 9th state ('U') is then used to force the update of the signals.
A circuit with 4 inputs and 4 gates is mapped in 1 cycle, 2 cycles can test 64 signals, 3 cycles 512 signals... It's a crazy speedup !
The funniest side is that i came up with this algorithm almost 20 years ago now. I wonder why I didn't consider it until now...
A new twist can be added with the inclusion of some "signal integrity check". A linear function extends the range of the numbers to transmit, and adds a majority of numbers that are not valid. This should catch cases with driver conflicts because the library uses std_logic_vector and not std_ulogic_vector. If we consider that one more probe cycle is not a significant cost, then we can extend the coding space by a factor of 8. The linear function 5x+3 should work well enough. The function has another offset because the input number ("gate number") can be negative (for a port).
08/31/2020 at 00:45 •
The development cycle for v2.10 starts and it's a good time to discuss how v2.9 works, in particular for the internal data structures. They are well suited for what they do in v2.9 but, knowing what I know now, it is obvious I should have left more margin in the code. Going back to v2.9 is also a way to evaluate what I could change without breaking what (fragile) code exists and works until now.
Note : I will explicitly cover the mechanics behind the "probe" mode only. The "fast" mode and the "simple" version are not at stake here. What I want is a better framework to analyse a netlist with not just ports and boolean logic, but also latches and RS FF.
The very first mechanism is the census of the gates. Each instance gets its own unique number during elaboration, thanks toa little trick with the generics system. Here is the example of the generic 2-inputs gate :
entity generic_gate2 is
generic( Gate_Number : integer := Gate_Census;
exclude: std_logic_vector := "");
port( A, B : in std_logic;
LUT : in std_logic_vector;
Y : out std_logic);
architecture trace of generic_gate2 is
signal LUT4 : LookupType4;
LUT4 <= Inferlut(LUT, LUT4'instance_name, Gate_Number, 3, exclude);
Y <= Lookitup2(A, B, LUT4, Gate_Number) after ns;
The Generic's value is given by the Gate_Census procedure, which is pretty basic :
-- just count the number of registered gates
impure function Gate_Census return integer is
gate_instance_counter := gate_instance_counter + 1;
So yes it's just a counter, so far. But it is called for each and every analysed gate, and updates the shared variable gate_instance_counter every time. After elaboration, not only does each gate has a unique number (very useful for later) but the analyser knows how many gates to check. This is used at the end of the elaboration to allocate room for all of them, in an array called GateList.
List of gates
After elaboration, the VHDL simulator initialises the signals. In the example gate above, the first line is executed first (for each instance of the component):
LUT4 <= InferLUT(LUT, LUT4'instance_name, Gate_Number, 3, exclude);
That's quite tricky and it tries to do a lot at once because there is so much to initialise...
- First, it has to write to a signal because otherwise, the execution of the InferLUT function would not happen at the right time (before the others). But since all the parameters shouldn't change (yes, right) this initialisation function is called just once.
- The first argument is the desired contents of the LUT, which can be given directly as a literal value (as in PA3_definitions.vhdl), as a generic or a port (as in the example, though this is not very "clean" because it creates the risk of re-running the initialisation routine)
- The next argument is the text string that identifies the gate, for human consumption and pretty-reporting.
- The Gate_Number is given so the function knows where to put the gate's definition.
- 3 is the last valid index in the LUT (it's the size)
- exclude is a generic that eventually flags which input combinations not to care about.
So after the delta cycles of t=0 are over, GateList contains the list of all the gates. There is no real order, it's mostly defined by the inclusion order and this is in theory something that you shoudn't rely on (though GHDL does a great job of preserving the writing order).
Note that all the gates are included and counted but not all should be analysed : gates with 0 or 1 input are "degenerate" because a fault on them is equivalent to a fault on the input of the larger tested gates. The list of gate number to actually analyse is stored in indexPtr and its size is registered_instance_counter
indexPtr := new integer_array(0 to gate_instance_counter);
in function InferLUT:
-- register into the list of updates/display:
if lastbit > 0 then
indexPtr(registered_instance_counter) := gate_number;
registered_instance_counter := registered_instance_counter + 1;
The indexPtr array might not be filled, as registered_instance_counter can be less than gate_instance_counter. The extra unused space is just left alone... But at least the list is contiguous :-)
Specific gates can be hidden or ignored with this added layer, so we can focus on actual logic. It is mostly used by update_histogram() and display_histogram(). The integral list of gates can still be displayed/dumped with the runtime parameter -ggate_select_number=-1
By the time you use VectGen.vhdl, a trick is introduced and it matters a lot for the rest of the code :
Ports are given by a negative "gate number" value and the meaning depends on where the number is found.
- If the negative number is found at the output of a gate, the number must represent an output port.
- Similarly, input ports only make sense for inputs of gates.
So the input ports and output ports require fewer infrastructure and are represented by a couple of vectors:
-- notice the swap : the input vector is seen as outputs by the following gates...
shared variable output_records : input_records_access; -- the array of descriptors of the output vector
shared variable input_records : output_record_access; -- array of the descriptors of the input
Also notice that having negative numbers for the ports creates the corner case where the port index is 0. The consequence is that the first gate has index=1. There is no "gate number 0".
Once VectGen.vhdl is being used, a new structure appears, it's the "depth list", or an array of arrays of gate numbers. This is painful to build but invaluable for the next steps (up to the generation of the test vectors).
08/24/2020 at 18:43 •
No it's not another Doctor Who fanfic ;-)
The logic depth analysis detects logic loops in digital netlists and this is a good safety net because otherwise, it couldn't tell the depth anyway. This is explained in the log Zombies in v2.6 with the classic examples of a dual-NOR2 flip-flop and a MUX-based latch (among others) :
The analyser can detect the incoherent logical depth but can't tell if a logical loop is either a latch or an oscillator (or both ?) so it's safest to just exit with an error.
But this is not enough : not all digital circuits are pure boolean networks and loops happen in a way or another. As described in the log 127. A tale of Flip-Flops, I also need to handle DFF, RS FF and transparent latches. I also describe a model for dealing with them all.
DFFs are quite easy to model : just consider its inputs and outputs as ports, because the effect is not on the current clock cycle.
But I also need Set/Reset flip-flops and I have defined a table of "macros" (which will then be substituted with the real gates) as well as a strategy to allow the analysis.
Mapped to Set 0 0 S0R0 AO1A Set 0 1 S0R1 AO1C Set 1 0 S1R0 AO1, AON21 Set 1 1 S1R1 AO1B, DLI1P1C1, AON21B Reset 0 0 R0S0 OA1A Reset 0 1 R1S0 OA1C Reset 1 0 R0S1 OA1, OAN21 Reset 1 1 R1S1 OA1B, DLN1P1C1, OAN21B
Let's add to this a new "virtual component" : backwards is a "time machine" that sends the value of one net to another identical net but with a different logic depth. The "simple" definition is just a wire that will be simulated as such, but the analysis version breaks the temporal causality that defines the loop (while also preserving the fanout so this is not a buffer).
I had to invent a symbol so I chose a B with a leftwards arrow... Here is the "weird circuit" with the appropriate new symbol :
This new component is a first little step toward a more extensive redesign that will allow the other types of sequential gates to be analysed, such as this circuit (from an old IBM european patent EP0092663A2 from 1983).
(there would be 2 "backwards" meta-components)
I have added the two files SRFF_simple.vhdl and SRFF_PA3.vhdl to the A3P library. It shouldn't belong there, strictly, but I have no other place for it (yet).
08/16/2020 at 17:47 •
In the log 124. TAP timing & simulation I encountered a strange situation : the simulation would appear to take much longer than expected. This occurs after elaboration and during the first wait of the driving process.
This did not appear before because I think 2 factors played together for the first time :
- The design has grown and reaches about 100 gates, the largest I've tried so far. But this shouldn't be an issue by itself because the MUX64 is quite fast to sim.
- This if the first time I use a discrete flip-flop (a pair of NOR2) which could create undefined behaviours at start-up. But this shouldn't be a problem because the situation is already handled, right ?
Unfortunately I'm not able to trace what happens exactly but here are some tips :
- Make sure this is not an elaboration issue. A simple report lets you trace when and where the delay occurs. And normally, the total runtime should be directly proportional to the workload so change the iteration count for example.
- Check all the parameters of the initial values of the signals, either during declaration and in the testbench.
- Avoid undefined states and logical loops, and shield/split the units from them (gating/enabling data helps)
- Use different versions of the source code : change the architecture, try with a behavioural implementation, or the "simple" gates library.
In practice, the last tip can be a good compromise : it's not as fast as a behavioural model but less heavy than the "full trace" that is the default option.
However the "full trace" and "simple" libraries don't cohabit well but the design flow could require them both, depending on the level of analysis to perform. I have chosen to add a "simple" sub-directory in the latest version, where the homonymous version is built separately from the "full trace" version. Scripts using GHDL can then select which version is used on a case-by-case basis, depending on the requirements, by selecting the right path.
For the #YGREC8 this saves some seconds and uses more room but this is necessary because the tracing version can't analyse properly the sequential gates for now.
07/18/2020 at 05:01 •
I just added a test of a few DFF ( DFN1E1C0 , DFN1C0 ) in the form of a Gray code counter. This uncovered a biiig bug in my code and libraries, that are now easily solved. Burn all previous versions !
I also rearranged the order of the sanity tests / examples so the shortest come first.
This new release is already 198668 bytes when compressed...
I am now considering how I could integrate the FF in the verification system, they use a LUT2 so far so they are excluded from the algorithm...
07/14/2020 at 20:11 •
These newcomers are not part of the ProASIC3 family but are welcome extensions to the gates library, as discussed in 107. Choosing the gates. Say hello to OAI22, AOI22, OAI211 and AOI211 !
Oh and after I added them, I find this interesting study about them :
so I must not be completely wrong here.
03/24/2020 at 17:29 •
It adds (or restores) a feature/behaviour that was considered in the beginning, then abandoned, then lately I thought it would be cool
- to be able to simulate the designs at a slightly faster speed (though it's subjective)
- to be able to synthesise the designs on different platforms that don't understand/know Actel's legacy conventions.
The last point convinced me so here it is : PA3_definitions_simple_nodelay.vhdl
I copy-pasted the code that generates the LUTs. I had to modify the code of the MX2x gates and this also introduces a timing inconsistency but it's minor and not used (since it's apparent only when timing is used AND I'm NANDifying my code).
This version doesn't use external definitions that are necessary for analysis and introspection. It's purely so the Actelified source code can work on Lattice/Xilinx/Intel/etc.
The default version remains the "trace" one, and it can be regenerated at will, with the timing you want, since I have provided 2 scripts and the generator can be rerun as you like.
Update 2020028 : both versions are built now by a single script so they can be used in parallel, selected by the -P (include path) parameter of GHDL.
12/29/2019 at 20:24 •
It's the last update of this project for 2019 !
I have reorganised the files and directories a bit... More needs to be done but it's satisfying so far and I need to go back to work on the ALU8 unit. It was a good opportunity to test that the library integrates well with other projects so now I'm back to the #YGREC8project for a bit.
See you in the next decade !
12/28/2019 at 02:42 •
Update: some "zombies" have an explicit solution, explained in https://hackaday.io/project/162594-vhdl-library-for-gate-level-verification/log/182670-time-travel-and-zombies
I'm still trying to figure out a good method to solve the final problem of vector generation.
Meanwhile I'm also making v2.6 more solid with a better detection of netlist warts, such as unconnected sinks and sources. I label those faulty gates or ports : "zombies". I don't make much efforts to "clean up" the netlist, instead I'll bail out if the netlist contains zombies. There is no point in trying because the vectors make sense only on a final design, however I try to make the output less cryptic and more useful when the tool is used as a simple netlist checker during design and for regression tests.
Also : the netlist will not "see" GND and VCC gates because they have no input. They don't make sense anyway, though A3P netlist often contain these... I consider these as "NOFIX".
A3Ptiles_v2.6_20191228.tgz has all the good stuff.
More info can be found in the test4_cornercases directory. The weird_unit.vhdl file implements the following circuits:
ff1 and ff2 implement a well-known set/reset flip-flop circuit and the outputs are "sequential" because they depend on the previous state, so they are "zombies". Both NAND2 are correctly flagged.
The mx0 implements a transparent latch. mx0 is correctly flagged, as well as the next gate that depends on the output.
Out(4) is left unconnected.
open1 and open2 are two chained gates with the final output dangling open. This is where things get a bit complex because open2 is correctly flagged as a zombie but not open1. Removing both would affect other parts of the system, such as reducing the depth of the design, which would create an avalanche of other effects...
open3 is correctly flagged as a zombie because both inputs are unconnected.
The end of the report says this:
Latency of the 5 outputs : Output#0 : N/A Output#1 : N/A Output#2 : N/A Output#3 : N/A Output#4 : N/A Found 6 zombie gates or inputs (unconnected or loops) : - Gate #6 : fanout= N/A - :vectgen(plip):wrap@vg_wrapper(weird):open2@and2(trace):lut4 - Gate #1 : fanout=2 - :vectgen(plip):wrap@vg_wrapper(weird):ff1@nand2(trace):lut4 - Gate #2 : fanout=2 - :vectgen(plip):wrap@vg_wrapper(weird):ff2@nand2(trace):lut4 - Gate #3 : fanout=2 - :vectgen(plip):wrap@vg_wrapper(weird):mx0@mx2(trace):lut8 - Gate #4 : fanout=1 - :vectgen(plip):wrap@vg_wrapper(weird):dummy@and2(trace):lut4 - Gate #7 : fanout= N/A - :vectgen(plip):wrap@vg_wrapper(weird):open3@and2(trace):lut4
I chose to flag the errors rather than correct them (by pruning), for these reasons :
- The correction/remedy could make the situation worse if there is a misunderstanding or a bug
- It is not the purpose of the system : I just want to make sure the vectors are generated with a sane dataset.
- It was not planned in advance and this "late feature" is harder to add (it was not straight-forward to implement anyway)
- Feature creep is bad and I want to KISS.
So I hope people can use this tool as a filter, I try to present useful information, so the user can correct their design.