Libre Gates

A Libre VHDL framework for the static and dynamic analysis of mapped, gate-level circuits for FPGA and ASIC. Design For Test or nothing !

Similar projects worth following
As the project "VHDL library for gate-level verification" ( ) was progressing, more features and more abstraction were developed, such that libraries other than the ProASIC3 family could be implemented. ASIC libraries such as sxlib or wsclib are good candidates, and more will appear with the surge in DIY ASIC projects spurred by Google & Skywater.

This project brings its pure GHDL-driven code to more technologies, allowing users to check their circuit, verify BIST coverage and eventually generate test vectors with only a set of really Libre files in pure VHDL (thus avoiding expensive and restrictive license fees from niche proprietary vendors). Oh, did I mention how awesome GHDL is ? But you could use any simulator that is fully VHDL'93 compliant.

Import from, and export to other netlist formats are also in the air...

These files let you process VHDL files mapped to FPGA/ASIC gates, to:

  • Simulate the circuit (for example, if your synthesiser has mapped the gates to a given PDK but you don't have the corresponding gates in VHDL)
  • Perform static analysis of the netlist (unconnected inputs or outputs, and other common mistakes)
  • Extract dynamic activity statistics (how often does a wire flip state, if at all ?)
  • Verify that any internal state can be reached (thus helping with logic simplifications)
  • Alter any boolean function, inject arbitrary errors and prove your BIST strategy
  • Extract logic traversal depth and estimate speed/latency
  • Inspect logic cones, see what inputs and outputs affect what
  • Help with replacing DFFs with transparent latches
  • Ensure that the circuit is correctly initialised with the minimal amount of /RESET signals
  • Detect and break unexpected logic loops or chains

Some day, it could be extended to

  • Pipeline a netlist and choose the appropriate strategy (will require detailed timing information)
  • Transcode/Transpile a netlist from one family/technology to another
  • Import/export to EDIF or others ?

Note: Since the tool typically processes netlists before place&route, no wiring parasitics data are available yet so no precise timing extraction is possible and it doesn't even try. It can however help, in particular with extraction of the criticality of each path then the mapping of gates to the proper fanout.

The project started as #VHDL library for gate-level verification but the scope keeps extending and greatly surpasses the ProASIC3 domain. I also study the addition of the minimalist OSU FreePDK45. More unrelated libraries should be added in the near future, depending on applications : Skywater PDK and Alliance could follow. Contact me if you need something !

1. First upload
2. Second upload
3. Rewrite
4. More features ! (one day)
5. Another method to create the wrapper
6. inside out
7. OSU FreePDK45 and derivatives
8. The new netlist scanner
9. Chasing a "unexpected feature" in GHDL
10. Polishing and more bash hacking
11. Completeness of a simple heuristic
12. Benchmarking results
13. Wrapper rewrite


New wrapper generator, new brute-force test/benchmark...

x-bzip-compressed-tar - 232.98 kB - 11/21/2020 at 20:17



Faster fault test with parallel execution, weird bash bug squashed.

x-bzip-compressed-tar - 199.16 kB - 11/16/2020 at 03:17



self-tests pass OK, OSU barely started, wrapper generator OK with split source files, netlist-probe remains to be done, some obsolete stuff will be removed later.

x-bzip-compressed-tar - 198.16 kB - 11/08/2020 at 05:05



copy from

gzip - 1.69 MB - 09/20/2020 at 18:38



Another interim : wrapper generator finally working ! but the VHDL side must be updated.

x-bzip-compressed-tar - 195.26 kB - 09/13/2020 at 00:44


View all 8 files

  • Wrapper rewrite

    Yann Guidon / YGDES4 days ago 0 comments

    Good news everyone !

    The benchmarking results are encouraging and made possible thanks to a new, rewritten version of the wrapper, which now even handles some essential generics ! You can expose generics of integers and string-based types, including std_logic, text and SLVx.

    The core of this tool relies on GHDL's XML output, which is then parsed by a crude bash script. This is part of the new release :-)

  • Benchmarking results

    Yann Guidon / YGDES5 days ago 0 comments

    The results are finally available !

    [yg@localhost test6_bigLFSR]$ ./ 
    Simple gate version setup (RAM+time) :
    290 4260 0:00.00
    580 4528 0:00.00
    1160 4800 0:00.00
    2030 5576 0:00.00
    2900 6132 0:00.00
    5800 8272 0:00.01
    11600 12296 0:00.02
    20300 18568 0:00.03
    29000 24744 0:00.05
    58000 45264 0:00.09
    116000 86364 0:00.18
    203000 148588 0:00.30
    290000 210440 0:00.44
    580000 417476 0:00.91
    1160000 831120 0:01.77
    2030000 1451852 0:03.11
    Detailed gate version setup (RAM+time) :
    290 5480 0:00.00
    580 6652 0:00.01
    1160 8668 0:00.01
    2030 12052 0:00.03
    2900 15336 0:00.04
    5800 26764 0:00.08
    11600 49428 0:00.16
    20300 83392 0:00.28
    29000 117340 0:00.39
    58000 230732 0:00.79
    116000 457372 0:01.53
    203000 797488 0:02.70
    290000 1137588 0:03.81
    580000 2270924 0:07.51
    1160000 4538012 0:15.17
    2030000 7938396 0:27.71
    Benchmark : OK

    I wanted to test several things :

    • Time and RAM are roughly linear so it's a good news. Note that this is only the setup performance.
    • Setup time is 10× with the detailed version, and I don't even run an iteration !
    • The detailed version uses about 6× more RAM, but that amounts to about 4KB for a single gate !

    This means that with 16GB RAM, it is possible to simulate approx. 20M gates and analyse 3M gates.

    You can run this test manually with the newer archives. It's a stress test for the system and the behaviour will change depending on your computer configuration. I don't assume your CPU speed or RAM size, so run it cautiously.

    Update 20201121:

    I managed to run the design inside the wrapper, and the overhead is marginal (5% size, <10% time)

    290 5516 0:00.00
    580 6808 0:00.01
    1160 9200 0:00.01
    2030 12676 0:00.03
    2900 16228 0:00.04
    5800 28088 0:00.08
    11600 51880 0:00.17
    20300 87528 0:00.29
    29000 123120 0:00.41
    58000 241696 0:00.78
    116000 479028 0:01.55
    203000 835028 0:02.78
    290000 1191172 0:03.92
    580000 2377824 0:07.98

    The graphs will be auto-generated if you have installed gnuplot on your system.

    I still have to perform dynamic comparisons and I have not even started re-implementing the gates probes.

    Anyway the 10x speed&size gain with the "simple" version vindicates the choice and efforts to make 2 versions.

  • Benchmarking with a HUGE LFSR

    Yann Guidon / YGDES11/16/2020 at 08:57 0 comments

    After I solved the weird issues of logChasing a "unexpected feature" in GHDL, it's time to put the lessons to practice and implement that huge fat ugly LFSR. It's not meant to be useful, beyond the unrolling of many, many LFSR stages and see how your computer and my code behave. So I created test6_bigLFSR/ in the project.

    The LFSR's poly is finally chosen, thanks to which contains a huge collection of primitives. For 32 bits, I downloaded 32.dat.gz (186MB) which expands to 600MB. It's huge but practical because you can grep all you want inside it :-) The densest poly is 0xFFFFFFFA, which is also the last. It contains 29 continuous XORs, which makes coding easy !

    For a quick test, I wrote lfsr.c which helps visualise the behaviour. The code kernel is a 2-steps dance, with rotation followed by selective XORing.

    U32 lfsr() {
      U32 u=LFSR_reg;
      if (LFSR_reg & 1)
        u ^= LFSR_POLY;
      LFSR_reg = (u >> 1) | (u << 31);
      return LFSR_reg;

    To help put this code in perspective, I also created a small LSFR with circuitjs using a 5-tap with another ultra-dense poly 0x1E.

    One of the subtleties of LFSR poly notation is that the MSB (which is always 1) describes the link from the LSB to the MSB and does not imply a XOR gate.

    Unrolling the LFSR is pretty easy witch copy-paste. It is however crucial to keep the connections accurate.

    Each column of XOR2s has their own 5 signals so all is fine and should work. However we have seen already that GHDL has some issues with massive assignations, a shortcut is necessary. It's easy to spot when we move the wires around : there is no need to copy a stage to the other, just get the value from the appropriate previous stage directly.

    Still 5 wires between each stage but only 3 need to be stored, the others are retrieved "from the past". From there the rule is obvious : the benchmark needs only as many storage elements as there are XOR gates, which is 29 for the 0xFFFFFFFA poly. The new issue now is that the 0x1E poly is not totally like 0xF...A : there is one bit of difference. I will now illustrate it with a reduced version 0xFA and extrapolate from there. Here it is with circuitjs:

    By coincidence, 0xFA is also a primitive poly so it also provides a 255-cycles loop, just try it ! The 32-bits implementation will simply add 6×4 consecutive XORs to the circuit.

    Unrolling is very similar. The critical part is to get the connections "right". Fortunately, the only difference is the absence of a XOR just above the LSB, which is translated by sending the result to the cycle after the current cycle. The resulting circuit is :

    Note: there are 2×3=6 stages, while the LFSR has a period of 255=3×5×17 so the resulting circuit still has a period of 255. Not that it matters but 1) it's good to know in case you encounter this situation 2) it motivates me and brings challenging practical constraints into the benchmark :-)

    So all there is to do now is to add as many taps as necessary to get back to 0xFFFFFFFA. Oh, and also deal with the initial and final taps... So let's map the 32 taps to the 29-xor vector, called XOV, at time t:

    • All gates receive one signal from XOV(t-1)(0)
    • The other signal comes from t-1, t-2 and t-3:
      • For XOV(t)(0) : XOV(t-2)(1)
      • For XOV(t)(1 to 'last-1) : XOV(t-1)(2 to 'last)
      • For XOV(t)('last) : XOV(t-3)(0)

    This list of spatio-temporal links is illustrated below:

    From the theory point of view, this shows how the Galois and Fibonacci structures are 2 ways to express the same thing or process:

    • The Galois performs in parallel, all the elements are available for one point in time.
    • The Fibonacci structure is serialised, with only one value changed but with visibility into the "past", the previous values.

    The circuit described here is in the weird crossover region between these approaches. The above list shows how to wire the XOR gates and the outputs.

    Connecting the inputs is a bit less trivial...

    Read more »

  • Completeness of a simple heuristic

    Yann Guidon / YGDES11/16/2020 at 04:38 0 comments

    The archive contains several tests, including some exhaustive fault injection scans. The scan algorithm ignores all the gates with fewer than 4 inputs because

    • No-input gates are constants and are not really implemented in ASIC. Nothing more to say about it.
    • 1-input gates can only be inverters or buffers and they amount to a wire: the logic value is propagated (even if inverted) but if the gate is altered, it then behaves like a fixed value (a no-input gate) which can be detected.

    Let's consider a gate BUF ( input A, output B) with LUT2(0,1):

    • altering bit 0 will flip the 1 to 0, giving the LUT(0,0) and working like a GND,
    • altering bit 1 will flip the 0 to 1, giving the LUT(1,1) and working like a VCC.

    So as long as the A input is toggled, the Y output will change (or not if there is a fault). This change is propagated by

    • output ports, or
    • gate sinks which will in turn toggle output ports.

    In conclusion, only gates with 2 or more inputs need a "LUT bit flips" to check the circuit.

    This is a stark contrast with verification methods from the 60's where logic was wired (often manually) and the connexions themselves were delicate. Any fault needed to be identified, located and fixed, so the automated systems focused on the observability of each wire, sometimes forcing the addition of extra "observability wires" to circuits.

    This old method has been carried over to IC design but the needs have changed: we only need to know if a circuit works correctly, we don't care much about why or where is fails (except for batch reliability analysis) so there is no need to focus on the wires.

    However, some inference algorithms are shared because we still have to determine 2 things:

    • How to observe a gate's output
    • How to force a gate's input to a given value

    This is where things will be difficult.

  • Polishing and more bash hacking

    Yann Guidon / YGDES11/12/2020 at 11:35 9 comments

    Before digging too deep into the netlist extractor, I wanted to review and massage the files a bit. @llo helped and installed the latest GHDL on Fedora over WSL: several warnings uncovered some variable naming issues, now solved. Thanks Laura !

    The archive seems to be almost easy to use, but speed could still be better. Building the library is essentially linear, but the various tests could be run in parallel. It does not matter much because they are pretty short, except for the longest step in ALU8 where all the 500+ faults are injected, and it takes a while... The fault verification of INC8 could also use less time if all 4 cores could run simultaneously in this "trivially parallel" application.

    I still don't want to use make. There are other paralleling helpers such as xargs or GNU parallel but they create too many complications for passing information back and forth. I also want to avoid semaphores, critical sections or locks... and I finally found the right ingredients using only bash! Here is the structure:

    # if NBTHREADS is not set or invalid, then query the system
    [[ $(( NBTHREADS + 0 )) -eq 0 ]] &&
      NBTHREADS=$( { which nproc > /dev/null 2>&1 ; } && nproc ||
         grep -c ^processor /proc/cpuinfo )
    echo "using $NBTHREADS parallel threads"
    function TheWorkload () {
      echo "starting workload #$1"
      sleep $(( (RANDOM % 5) + 1 ))
      echo "end of workload #$1"
      for i in $( seq 1 20 )
        threads=$(( $threads + 1 ))
        echo "loop #$i, $threads running threads"
        TheWorkload $i &
        if [[ $threads -ge $NBTHREADS ]] ; then
          echo "waiting ..."
          wait -n
          threads=$(( $threads - 1 ))
      echo "End of script !"
    } 2> /dev/null # hide bash's termination messages

    Something is still missing: the loop must stop if one workload fails. It's quite delicate with more sophisticated tools but in bash, it's as easy as detecting the fault and break the loop. This leads to this new version:

    [[ $(( NBTHREADS + 0 )) -eq 0 ]] &&
      NBTHREADS=$( { which nproc > /dev/null 2>&1 ; } && nproc ||
         grep -c ^processor /proc/cpuinfo )
    echo "using $NBTHREADS parallel threads"
    function TheWorkload () {
      echo "starting workload #$1"
      sleep $(( (RANDOM % 5) + 1 ))
      echo "end of workload #$1"
      # make the 12th run break the loop
      [[ "$1" -eq "12" ]] && {
        echo "$1 reached !"
      for i in $( seq 1 20 )
        threads=$(( $threads + 1 ))
        echo "loop #$i, $threads running threads"
        TheWorkload $i &
        if [[ $threads -ge $NBTHREADS ]] ; then
          echo "waiting ..."
          wait -n && {
            echo "Got an error !" ; err_flag=1 ; break ; }
          [[ $err_flag -eq "1" ]] && {
            echo "found an error !" ; break ; }
          threads=$(( $threads - 1 ))
      [[ $err_flag -eq "0" ]] && { echo "it worked."
               } ||   echo "premature termination"
    } 2> /dev/null # hide bash's job termination messages

    Since everything is controlled by a single loop in the script, and the flag can only be written by the workload, there is no risk of race condition, so no need of semaphore or what else... The only tricky part is to ensure that the remaining tasks, have to complete and check their errors too.

    I have seen other similar examples that use the jobs command to list the spawned threads but there is the risk of catching sub-sub-threads if the workload forks... Using a thread counter is faster (no fork-exec of other tools) and limits the scope of the test, which removes interferences.

    There are more issues to solve such as adapting the result of each invocation of the GHDL-generated binary. But another place where significant time can be saved is during the simulation setup: the binary is restarted all the time despite only tiny changes are made. The VHDL simulation system must also be updated...


    Hmmm it seems I botched the script and made wrong assumptions. It is not possible to send a value back to the main script once a workload has forked. Yet this code works because...

    Read more »

  • Chasing a "unexpected feature" in GHDL

    Yann Guidon / YGDES11/09/2020 at 00:31 0 comments

    I know, I know, I should update my SW but if a 2017 build of GHDL misbehaves in significant ways, I wonder why it had waited so long to be apparent.

    So I'm adding a new "loading" test to the self-check suite, because I want to know how many gates I can reasonably handle, and I get weird results. It seems that GHDL's memory allocation is exponential and this should not be so. So I sent a message to Tristan, who sounds curious, asks for a repro, and here I start to rewrite the test as a self-contained file for easy reproduction.

    The test is simple : to exercise my simulator, I create an array of fixed std_logic_vector's, with a size defined by a generic. Then, I connect elements of one line to other elements of the next line. Finally I connect the first and last lines to an input and output port, so the circuit can be controlled and observed by external code.

    The result will surprise you.

    So let's define the test code.

    -- ghdl_expo.vhdl
    -- created lun. nov.  9 00:56:57 CET 2020 by Yann Guidon
    -- repro of "exponential memory allocation" bug
    Library ieee;
        use ieee.std_logic_1164.all;
    entity xor2 is
      port (A, B : in  std_logic;
               Y : out std_logic);
    end xor2;
    architecture simple of xor2 is
      Y <= A xor B;
    end simple;
    Library ieee;
        use ieee.std_logic_1164.all;
    Library work;
        use work.all;
    entity ghdl_expo is
      generic (
        test_nr : integer := 0;
        layers : positive := 10
        A : in  std_logic_vector(31 downto 0) ;
        Y : out std_logic_vector(31 downto 0));
    end ghdl_expo;
    architecture unrolled of ghdl_expo is
      subtype SLV32 is std_logic_vector(31 downto 0);
      type ArSLV32 is array (0 to layers) of SLV32;
      signal AR32 : ArSLV32;
      AR32(0) <= A;
      t1: if test_nr > 0 generate
        l1: for l in 1 to layers generate
          AR32(l)(0) <= AR32(l-1)(31);
          t2: if test_nr > 1 generate
            l2: for k in 0 to 30 generate
              t3: if test_nr = 3 generate
                x: entity xor2 port map(
                   A => AR32(l-1)(k), B=>AR32(l-1)(31),
                   Y => AR32(l)(k+1));
              end generate;
              t4: if test_nr = 4 generate
                AR32(l)(k+1) <= AR32(l-1)(k) xor AR32(l-1)(31);
              end generate;
              t5: if test_nr = 5 generate
                AR32(l)(k+1) <= AR32(l-1)(k);
              end generate;
            end generate;
          end generate;
        end generate;
      end generate;
      Y <= AR32(layers);
    end unrolled;

    There are 2 generics :

    • test_nr selects the code path to test.
    • layers selects the number of elements of the array.

    And now let's run it :

    rm -f *.o *.cf ghdl_expo log*.txt
    ghdl -a ghdl_expo.vhdl &&
    ghdl -e ghdl_expo &&
    # first test, doing nothing but allocation.
    ( for i in $(seq 2 2 100 )
      echo -n $[i]000
      ( /usr/bin/time ./ghdl_expo -glayers=$[i]000 ) 2>&1 |
          grep elapsed |sed 's/max.*//'|
           sed 's/.*0[:]/ /'| sed 's/elapsed.*avgdata//'
    done ) | tee log1.txt

     The result (time and size) are logged into log1.txt, which gnuplot will happily display :

    set xlabel 'size (K vectors)'
    set ylabel 'seconds'
    set yr [0:1.5]
    set y2label 'kBytes'
    set y2r [0:800000]
    plot "log1.txt" using 1:2 title "time (s)" w lines, "log1.txt" using 1:3 axes x1y2 title "allocated memory (kB)" w lines

    And the result is quite as expected : linear.

    The curve starts at a bit less than 4MB for the naked program, which allocated 10 lines of 32 std_logic. Nothing to say here.

    I would object that I expected better than 1.13 seconds to allocate 100K lines, which should occupy 3.2MB for themselves (so less than 8MB total). Instead the program eats 710MB ! This means a puzzling expansion factor > 200 ! Anyway, I'm grateful it works so far.

    Now, I activate the test 1:

    ( for i in $(seq 2 2 100 )
      echo -n $[i]00
      ( /usr/bin/time ./ghdl_expo -gtest_nr=1 -glayers=$[i]00 ) 2>&1 | 
      grep elapsed |sed 's/max.*//'| sed 's/.*0[:]/ /' | 
         sed 's/elapsed.*avgdata//'
    done ) | tee log2.txt

    The result looks different :

    Notice that I have reduced the number of elements by 10 and the max. run time is similar. Worse : the 10K vectors now use 3.2GB ! The expansion ratio has gone to 100K !...

    Read more »

  • The new netlist scanner

    Yann Guidon / YGDES11/06/2020 at 08:33 0 comments

    See the beginning at 3. Rewrite as well as 35. Internal Representation in v2.9 for more dirty details !

    After a hiatus, it's finally time to work again on the implementation of the algorithm explained in the log 36. An even faster and easier algorithm to map the netlist !

    The linear function 5x+3 can be configured to trade off security for speed, with POLY_OFFSET and POLY_FACTOR. The higher the factor, the better the discrimination of snafus, but that increases the number of steps. A generic with default value is appropriate in this case.

    The principle of serialising and de-serialising through wire-type symbols is shown in the picture below. The original idea, around 2001, used binary codes for real wires but std_logic provides 8 useful symbols, plus the "refresh" one ('U').

    Conversion between integers and the enumerated std_logic type is not as trivial as in other languages but still very easy, when you "get" how to dance around the strong typing constraints, as shown in this code:

    Library ieee;
        use ieee.std_logic_1164.all;
    entity test_std is
    end test_std;
    architecture arch of test_std is
      process is
        variable i : integer;
        variable s : std_logic;
        for j in std_logic'low to std_logic'high loop
          if j = std_logic'high then
             s := 'X';  -- not 'U', to show a custom wrap-around
             s := std_logic'succ(j);
          end if;
          report integer'image(std_logic'pos(j))
                & " : " & std_logic'image(j)
                & " : " & std_logic'image(s);
        end loop;
        for i in 0 to 7 loop
          s := std_logic'val(i+1);
          report std_logic'image(s);
        end loop;
      end process;
    end arch;

    The result:

    $ rm -f test_std *.o *.cf && ghdl -a test_std.vhdl && ghdl -e test_std && ./test_std 
    test_std.vhdl:24:7:@0ms:(report note): 0 : 'U' : 'X'
    test_std.vhdl:24:7:@0ms:(report note): 1 : 'X' : '0'
    test_std.vhdl:24:7:@0ms:(report note): 2 : '0' : '1'
    test_std.vhdl:24:7:@0ms:(report note): 3 : '1' : 'Z'
    test_std.vhdl:24:7:@0ms:(report note): 4 : 'Z' : 'W'
    test_std.vhdl:24:7:@0ms:(report note): 5 : 'W' : 'L'
    test_std.vhdl:24:7:@0ms:(report note): 6 : 'L' : 'H'
    test_std.vhdl:24:7:@0ms:(report note): 7 : 'H' : '-'
    test_std.vhdl:24:7:@0ms:(report note): 8 : '-' : 'X'
    test_std.vhdl:31:7:@0ms:(report note): 'X'
    test_std.vhdl:31:7:@0ms:(report note): '0'
    test_std.vhdl:31:7:@0ms:(report note): '1'
    test_std.vhdl:31:7:@0ms:(report note): 'Z'
    test_std.vhdl:31:7:@0ms:(report note): 'W'
    test_std.vhdl:31:7:@0ms:(report note): 'L'
    test_std.vhdl:31:7:@0ms:(report note): 'H'
    test_std.vhdl:31:7:@0ms:(report note): '-'

    So it's pretty trivial to convert an int to std_logic and vice versa. It's a bit less so to extract bit fields from an integer because VHDL does not provide the shift and boolean operators :-( As I have tested a decade ago, going with bitvectors is too slow. The only way is to divide or multiply but the semantic is somehow lost and this does not work correctly with negative numbers (due to inappropriate rounding). I don't want to use my own shift&bool routines because they link to C code and that might break somehow in the future with new revisions to GHDL.

    There is also a constraint on the amount of data that can be stored in the descriptor of each gate. For example a complete precomputed std_logic_vector would take too much room and the size would not be easy to determine before running the whole thing.

    One compromise could be to store one std_logic element that is precomputed before each new pass. There are already 2 such variables in the record of each gate: curOut and prevOut but then, what about the input gates ?

    curOut,                 -- the last result of lookup
    prevOut : std_logic;    -- the previous result of the lookup
    changes : big_uint_t;   -- how many times the output changed
    LUT     : std_logic_vector(0 to 15); -- cache of the gate's LUT.
    sinks : sink_number_array(0 to 3);

    The variable changes can be used to accumulate the number to serialise, but the LUT can't be overwritten because it's necessary for the following steps of the algorithm.

    sinks is...

    Read more »

  • OSU FreePDK45 and derivatives

    Yann Guidon / YGDES09/20/2020 at 18:07 0 comments

    One easy library to add to the collection is OSU's FreePDK45, found at (direct download link : : only 1.7MB, mirroredhere)

    It's a good candidate because it's a really surprisingly tiny library : 33 cells only !

    FILL (no logic input or output)


    AND2X1 AND2X2 HAX1 NAND2X1 OR2X1 OR2X2 NOR2X1 XNOR2X1 XOR2X1 (2 inputs)

    AOI21X1 FAX1 MUX2X1 NAND3X1 NOR3X1 OAI21X1 (3 inputs)

    AOI22X1 OAI22X1 (4 inputs)


    This is close to the minimum described at but should be enough for basic circuits. In fact we have the 1st order and most of the 2nd order gates, which I covered in log 31. v2.9 : introducing 4-input gates. OAI211 and AOI211 are missing, which are very useful for incrementers and adders...

    The site also provides these same basic standard cells for AMI 0.6um, AMI 0.35um, TSMC 0.25um, and TSMC 0.18um released in 2005 at A single bundle packages 4 technologies ! The library seems to have evolved to reach v2.7 and included a LEON example project: (but the latest archive looks broken)

    I love how minimal, compact and simple these gates are, accessible to beginners and targeting from 45nm to .5µm. It looks like a "RISC" approach to VLSI ;-) Note also that there are only 2 gates with two output drives and 2 inputs : AND and OR, which are 2nd order gates, merging a NAND/NOR with a X1 or X2 inverter. The rest of the fanout issues are solved by inserting INVX gates at critical fanout points. This means that physical mapping/synthesis should start by examining the fanouts, inserting the inverters/buffers and then deducing which logic function to bubble-push.

    I'm not sure how to create the files but I can easily derive them from the A3P files in 3 ways:

    • Modify the file generator
    • Modify the generated files
    • Create a "wrapper" file

    Furthermore the sequential gates are not yet clear about the precedence of the inputs.

    Extracting the logical functions was as simple as a grep and some sed :

    grep -r '>Y=' * |sed 's/.*data//'|sed 's/<tr.*FFF//'|sed 's/<.*//'|sed 's/[.]html.*Y/: Y/'
    CLKBUF2: Y=A
    AND2X2: Y=(A&B)
    NAND3X1: Y=!(A&B&C)
    NOR3X1: Y=!(A|B|C)
    XOR2X1: Y=(A^B)
    BUFX4: Y=A
    MUX2X1: Y=!(S?(A:B))
    OR2X1: Y=(A|B)
    AND2X1: Y=(A&B)
    TBUFX2: Y=(EN?!A:'BZ)
    /INVX8: Y=!A
    /CLKBUF3: Y=A
    INVX1: Y=!A
    AOI21X1: Y=!((A&B)|C)
    XNOR2X1: Y=!(A^B)
    BUFX2: Y=A
    OAI22X1: Y=!((C|D)&(A|B))
    TBUFX1: Y=(EN?!A:'BZ)
    OR2X2: Y=(A|B)
    OAI21X1: Y=!((A|B)&C)
    NAND2X1: Y=!(A&B)
    AOI22X1: Y=!((C&D)|(A&B))
    CLKBUF1: Y=A
    NOR2X1: Y=!(A|B)
    INVX4: Y=!A
    INVX2: Y=!A

    For the VHDL version, only a few more simple substitutions are required:

    $ grep -r '>Y=' * |sed 's/.*data[/]/"/'|sed 's/<tr.*FFF//'|sed 's/<.*//'|sed 's/[.]html.*Y=/": /' |sed 's/[!]/not /g'  |sed 's/[|]/ or /g'  |sed 's/[&]/ and /g' |sed 's/\^/ xor /g' |sort
    "AND2X1": (A and B)
    "AND2X2": (A and B)
    "AOI21X1": not ((A and B) or C)
    "AOI22X1": not ((C and D) or (A and B))
    "BUFX2": A
    "BUFX4": A
    "CLKBUF1": A
    "CLKBUF2": A
    "CLKBUF3": A
    "INVX1": not A
    "INVX2": not A
    "INVX4": not A
    "INVX8": not A
    "MUX2X1": not (S?(A:B))
    "NAND2X1": not (A and B)
    "NAND3X1": not (A and B and C)
    "NOR2X1": not (A or B)
    "NOR3X1": not (A or B or C)
    "OAI21X1": not ((A or B) and C)
    "OAI22X1": not ((C or D) and (A or B))
    "OR2X1": (A or B)
    "OR2X2": (A or B)
    "TBUFX1": (EN?not A:'BZ)
    "TBUFX2": (EN?not A:'BZ)
    "XNOR2X1": not (A xor B)
    "XOR2X1": (A xor B)

    Notice that MUX2 should be called MUXI and A and B are swapped relative to the A3P lib. Some other corner cases are easily translated by hand as well. The TBUFs however fall outside of the purely boolean realm...

    Some gates have 2 outputs...

    Read more »

  • inside out

    Yann Guidon / YGDES09/14/2020 at 17:51 0 comments

    The log 5. Another method to create the wrapper has successfully implemented the automatic wrapper that I intended to design one year ago. This is a relief, even though for now it forces bit vectors to use the SLV package of wrapper types. This could be solved later and it's not a great problem because the units I test have a fixed size, genericity has gone out of the window since the netlists are synthesised.

    Now this forces me to re-architect the whole code because now the wrapper is the top level. The previous choice has been discussed in the log 24. Hierarchy problems and solutions and we're back to the initial version:

    The bash script now knows the size of the vectors and can also "plug" the generator or driver but the rest of the code must be turned upside down like a sock. I have to rewrite/restructure the code from scratch, but the modifications are minor.

    The new structure has the wrapper containing the "driver" so the wrapper has no I/O ports, which are replaced by internal signals. Another change is that the driver is made of procedures and functions and it would be weird to encapsulate them in another entity. A main procedure with in and out arguments would work better and reduce the semantic complexity : in VHDL a procedure can work like an entity so let's exploit this trick :-) And the procedure can be nicely packaged in a separate library.

    Some more typing and I get the following example out of the ALU:

    -- Wrapper.vhdl
    -- Generated on 20200915-07:57
    -- Do not modify
    Library ieee;
        use ieee.std_logic_1164.all;
    Library work;
        use work.all;
    Library LibreGates;
        use LibreGates.Gates_lib.all;
        use LibreGates.Netlist_lib.all;
    entity Wrapper is
      generic (
        WrapperWidthIn : integer := 22;
        WrapperWidthOut: integer := 17;
        main_delay : time := 1 sec;
        filename : string := ""; -- file of exclude vectors
        verbose : string := ""
    end Wrapper;
    architecture Wrap of Wrapper is
      signal VectIn : std_logic_vector(WrapperWidthIn -1 downto 0);
      signal VectOut: std_logic_vector(WrapperWidthOut-1 downto 0);
      signal VectClk: std_logic := 'X';
      -- force the registration of the gates.
      -- must be placed at the very beginning!
      update_select_gate( -- called only once at start
        0, -- show gate listing when < 1
       -1, -- no alteration
        2, -- probe mode
        verbose, filename);
      Drive_DUT(VectIn, VectOut, VectClk, main_delay);
      -- Finally we "wire" the unit to the ports:
      tb: entity alu8 port map (
        cin => VectIn(0),
        neg => VectIn(1),
        passmask => VectIn(2),
        orxor => VectIn(3),
        rop2mx => VectIn(4),
        cmps => VectIn(5),
        sri => VectIn(13 downto 6),
        snd => VectIn(21 downto 14),
        rop2 => VectOut(7 downto 0),
        sum => VectOut(15 downto 8),
        cout => VectOut(16));
    end Wrap; 

    The line use LibreGates.Gates_lib.all; is required because update_select_gate() is used. Drive_DUT() is defined is defined in the new package Netlist_lib where all the netlist management functions are moved, from Vectgen.vhdl which will soon be obsolete.
    So now all the "fun stuff" is going to happen in Netlist_lib.vhdl, in which the size of the vectors is very easy to get, with the 'LENGTH attribute:

    package body Netlist_lib is
      procedure Drive_DUT(
        signal VectIn :   out std_logic_vector;
        signal VectOut: in    std_logic_vector;
        signal VectClk: inout std_logic;
              Interval: in    time) is
        report   "VectIn: "  & integer'image(VectIn'length)
             & ", VectOut: " & integer'image(VectOut'length)
             & ", CLK: "     & std_logic'image(VectClk) ;
      end Drive_DUT;
    end Netlist_lib;

    There, it's solved. And I can even sense if the clock signal is connected, by setting it to 'X' or 'U' in the wrapper.

    Well no, it's not totally and definitively solved because it's a hack and it can't catch the std_logic_vectors. It works but it's not scalable.

    Unai did some code at for his project and there is hope that libghdl will be more and better used in the future. I didn't want to add more...

    Read more »

  • Another method to create the wrapper

    Yann Guidon / YGDES09/09/2020 at 23:12 0 comments

    For now I manually create the "wrapper" unit that maps the inputs and outputs to a pair of unified vectors.

    In private, Tristan told me there was some sort of Python thing that could do what I need but I'd rather avoid Python and experimental (like, less than 5 years old) features.

    But there could be another way to extract the inputs and outputs from the compiled design : with the trace dumps ! GTKwave knows the type and direction of each port so maybe I can parse the trace dump to extract the info I want.

    I'll have to test this ASAP, as I am currently contemplating how to make GetVectorsSize better.

    Trying to play with GHDL in the LibreGates/tests/test2_INC8 directory:

    > PA3="-P../../LibreGates -P../../LibreGates/proasic3"
    > ghdl -a $PA3 INC8_ASIC.vhdl
    > ghdl -e $PA3 INC8
    > ./inc8 --vcd=inc8.vcd
    > less inc8.vcd
      Sat Sep 12 05:18:00 2020
      GHDL v0
      1 fs
    $var reg 8 ! a[7:0] $end
    $var reg 8 " y[7:0] $end
    $var reg 1 # v $end
    $var reg 8 $ ab[7:0] $end
    $var reg 1 % a012 $end
    $var reg 1 & a012a $end
    $var reg 1 ' a012b $end
    $var reg 1 ( a34 $end
    $var reg 1 ) a345 $end
    $var reg 1 * a3456 $end
    $scope module (0) $end
    $scope module x4 $end
    $var reg 1 + a $end
    $var reg 1 , y $end
    $var reg 1 - t $end
    $scope module e $end


    I get a dump of signals but NO indication of type (which can only be assumed), direction or hierarchy...

    From the doc:

    "Currently, there is no way to select signals to be dumped: all signals are dumped, which can generate big files."

    There could be more hope with the ./inc8 --wave=inc8.ghw command but that would require more efforts because the dump is in binary, undocumented and subject to change...

    Another approach uses another underrated feature of GHDL :

    ghdl --file-to-xml $PA3  INC8_ASIC.vhdl

    The -a command is replaced here with --file-to-xml command which, as you can infer from the name, dumps a XML stream...

    Here we find the definitions of the ports. This is contained in the "port_chain" tag :

      <el kind="interface_signal_declaration" identifier="a" mode="in">
        <subtype_indication identifier="slv8"></subtype_indication>
      <el kind="interface_signal_declaration" identifier="y" mode="out">
        <subtype_indication identifier="slv8"></subtype_indication>
      <el kind="interface_signal_declaration" identifier="v" mode="out">
        <subtype_indication identifier="sl"></subtype_indication>

    I have removed a LOT of text to get only what is required.

    This allows the use of the wrapper types SLVx and SL, but now we need XML processing software to list and extract only the needed information...

    Well, this is where it's getting interesting : at first I'd thought that using the wrapping types would make thing harder but looking at the XML, it looks like a great way to simplify the parsing because the std_logic_vectors use a complex system to define the bounds. OTOH the type SLV8 makes a static statement about the type and size, which is not flexible or generics-configurable but much more direct to extract.

    Well, thanks to some unexpected hack found on StackOverflow, it seems that XML can be crudely parsed with some dirty-as-ever bash script.

    # LibreGates/
    # created sam. sept. 12 09:29:47 CEST 2020
    ReadTag () {
      IFS=\> read -d \< ENTITY
    GetAttribute () {      #input list in $1
      LIST=($ATTRIBUTES)   # split string into array
      for i in $(seq "${#LIST[@]}") ; do  # scan the list
        IFS='"' read -a ATTRIBUTE <<< "${LIST[$i]}"  # split the individual parameter in a sub-string
        #echo "  $ATTRIBUTE   -   ${ATTRIBUTE[1]}"
        if [[ $ATTRIBUTE = "$1" ]] ; then   # found the key
          echo -n "${ATTRIBUTE[1]}"         # return the value
    FlushPort () {
      if [[ "$PORT_TYPE" != "" ]] ; then
        case "$INOUT" in
          "in")  ;;
          "out") ;;
          *) echo "wrong port direction for signal $PORTNAME"
        echo "$PORTNAME $PORT_TYPE $INOUT...
    Read more »

View all 14 project logs

  • 1

    Get the latest package version from

  • 2

    Execute the script.

    This will build the libraries and run many self-tests.

    These examples in the tests directory also show you the various ways to use this library.

  • 3

    You can directly use this library with all the ProASIC3 standard files, either without the analytics system ("simple" version) or the full analytics system (the standard version).


    In the "simple" case, at full simulation speed and no analysis, your VHDL source code contains these lines :

    Library proasic3;
        use proasic3.all;

    Then you point GHDL to the right library with this command line:

    ghdl -Psomepath/LibreGates/proasic3/simple my_file.vhdl

    If you need full analysis, then use the standard version and add these invocations to the VHDL testbench:

    Library LibreGates;    use LibreGates.all;    use LibreGates.Gates_lib.all;

    Then you modify the inclusion path :

    ghdl -Psomepath/LibreGates/proasic3 my_file.vhdl

View all 4 instructions

Enjoy this project?



Dylan Brophy wrote 3 days ago point

"Design For Test or nothing !" - YES, good practice. Need more of this IMO.

  Are you sure? yes | no

Yann Guidon / YGDES wrote 3 days ago point

It must be the default.
Not just for HW but also SW.

  Are you sure? yes | no

frenchie68 wrote 3 days ago point

I wonder what you meant (wrt. software that is).

  Are you sure? yes | no

Yann Guidon / YGDES wrote 3 days ago point

I mean that systematic unit testing, thorough proofing/stressing/benchmarking are faint afterthoughts in most SW projects. People only query Google when they have a bug, a question or just want to learn how to do a specific task. Nowhere do I see tutorials about testbenches, it's mostly a culture where "it works so it's done".

One of the few exceptions is GCC where it is (was?) shipped with a suite of conformance self-tests.

But take any user-facing program and testing is "eventual" and "manual".

  Are you sure? yes | no

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates