VHDL library for gate-level verification

Collection of ASIC cells and ProASIC3 "tiles" in VHDL so I can design and verify optimised code without the proprietary libraries

Similar projects worth following
This is a collection of VHDL files that re-implement the gates of a FPGA family, which is close to ASIC structures and that I use to evaluate ASICs. It works for 4-input LUTs even though it started with Actel/Microsemi/Microchip's official definitions (LUT-3 gates). I don't address exact delays for very accurate simulation because they are proprietary and I don't need such a precision at the early stages of prototyping.

This library replaces the proprietary files during simulation of synthesised results, so I can make truly Free and Open Source designs, which can later be ported to ASIC technology. This is still a work in progress but the latest feature provide useful alternatives to the proprietary tools: I added custom extensions to provide incredibly useful verification features such as logical cone checks, BIST coverage checks, and I currently work on BIST vector generation.

Apart from shell scrips, all the code is portable VHDL'93 and uses GHDL (not mcode

This project is "more or less related" to #Shared Silicon and aims to better prepare and prototype designs in FPGA before committing to ASIC tapeout. I'm currently using this library for the #YGREC8 to check design sanity and testability (which is also why you'll find many references and files from Y8 here).

Lately, Christos joined forces and opened new perspectives and applications for this library, for test/validation/fuzzing through the injection of faults. It enables the verification of test benches and other critical tools, for fault-tolerant circuits, wafer-stepper test vectors...

Currently I'm working on the 2nd version of the library, a deeply refactored and enhanced version, and aiming at a v3 with most desired features in the near future. This becomes even more pressing with the MPW shuttles promised by Google on the Skywater 130nm node.

The library implements gates that can work in one of several modes, as explained in the log Modes of operation.

What can you do now with this ?

  • Simulate your mapped design (the initial purpose). The "simple" implementation ( >= v2.8 ) works anywhere, it's fast but without any fancy feature.
  • Check that the gates behave correctly, with histograms of their activity, and detect unused cases (this helps optimise/simplify a design, or ensure it is optimal)
  • Inject unusual signals at inputs to observe the logical cone
  • Alter the function of a given gate (simulate a hardware fault)
  • Compare which alternative architecture toggles the least nets, investigate toggle-reduction strategies to reduce switching-induced power consumption.
  • Implement arbitrary logic with the generic gates (arbitrary LUT16 is supported since 20191213, so the library is not restricted to ProASIC3 designs)

What's intended ? (not yet implemented)

  • BIST verification (brute-force through all the gates is possible thanks to parallelism)
  • Automatic Test Vector Generation
  • Import/Export EDIF ? CircuitJS ?

It all started as a Free collection of 3-input gates and some additional ProASIC3-specific modules, used to design my own systems.

See the license/ directory for the AGPLv3 terms of distribution.

The proasic3v3/ directory contains all the gates and modules, rewritten to simulate the real tiles and hard macros.

You'll find a series of examples in the testxxxxxx directories, that implement various 8-bits units using only 3-inputs gates and with a well-defined latency.

As of 20200715, the v2 supports these tiles and hard macros:

No input:

1 input:

2 inputs:
and2    and2a   and2b   nand2   nand2a  nand2b  nor2    nor2a   nor2b   or2     or2a    or2b    xnor2   xor2

3 inputs:
and3    and3a   and3b   and3c   ao1     ao12    ao13    ao14    ao15    ao16    ao17    ao18    ao1a    ao1b    ao1c    ao1d    ao1e    aoi1    aoi1a   aoi1b   aoi1c   aoi1d   aoi5    ax1     ax1a    ax1b    ax1c    ax1d    ax1e    axo1    axo2    axo3    axo5    axo6    axo7    axoi1   axoi2   axoi3   axoi4   axoi5   axoi7   maj3    maj3x   maj3xi  min3    min3x   min3xi  mx2     mx2a    mx2b    mx2c    nand3   nand3a  nand3b  nor3    nor3a  ...

Read more »


Fixed the latches, minor fix, major effect !

x-compressed-tar - 315.66 kB - 07/27/2020 at 05:53



defined some standard 4-inputs gates, updated INC8, added Gray6, fixed a big bug with DFF

x-compressed-tar - 194.01 kB - 07/18/2020 at 04:51



Version 2.8 can generate the "simple" definitions.

x-compressed-tar - 163.07 kB - 03/24/2020 at 17:22



v2.8 Library "simple" implementation with direct definition and NO external dependency or analysis capability, for compatibility. No delay

x-vhdl - 76.33 kB - 03/24/2020 at 17:18



Last release of 2019, easier to integrate in other projects.

x-compressed-tar - 160.00 kB - 12/29/2019 at 20:11


View all 35 files

  • Flip Flops (should) work...

    Yann Guidon / YGDES07/18/2020 at 05:01 0 comments

    I just added a test of a few DFF ( DFN1E1C0 , DFN1C0 ) in the form of a Gray code counter. This uncovered a biiig bug in my code and libraries, that are now easily solved. Burn all previous versions !

    I also rearranged the order of the sanity tests / examples so the shortest come first.

    This new release is already 198668 bytes when compressed...

    I am now considering how I could integrate the FF in the verification system, they use a LUT2 so far so they are excluded from the algorithm...

  • v2.9 : introducing 4-input gates

    Yann Guidon / YGDES07/14/2020 at 20:11 0 comments

    These newcomers are not part of the ProASIC3 family but are welcome extensions to the gates library, as discussed in 107. Choosing the gates. Say hello to OAI22, AOI22, OAI211 and AOI211 !

    Oh and after I added them, I find this interesting study about them :

    so I must not be completely wrong here.

  • A simple option

    Yann Guidon / YGDES03/24/2020 at 17:29 0 comments

    I think I skipped v2.7 that was on the drawing board (I still need to make more examples), but v2.8 is here !


    It adds (or restores) a feature/behaviour that was considered in the beginning, then abandoned, then lately I thought it would be cool

    1. to be able to simulate the designs at a slightly faster speed (though it's subjective)
    2. to be able to synthesise the designs on different platforms that don't understand/know Actel's legacy conventions.

    The last point convinced me so here it is : PA3_definitions_simple_nodelay.vhdl

    I copy-pasted the code that generates the LUTs. I had to modify the code of the MX2x gates and this also introduces a timing inconsistency but it's minor and not used (since it's apparent only when timing is used AND I'm NANDifying my code).

    This version doesn't use external definitions that are necessary for analysis and introspection. It's purely so the Actelified source code can work on Lattice/Xilinx/Intel/etc.

    The default version remains the "trace" one, and it can be regenerated at will, with the timing you want, since I have provided 2 scripts and the generator can be rerun as you like.

  • v2.6 release

    Yann Guidon / YGDES12/29/2019 at 20:24 0 comments

    It's the last update of this project for 2019 !


    I have reorganised the files and directories a bit... More needs to be done but it's satisfying so far and I need to go back to work on the ALU8 unit. It was a good opportunity to test that the library integrates well with other projects so now I'm back to the #YGREC8project for a bit.

    See you in the next decade !

  • Zombies in v2.6

    Yann Guidon / YGDES12/28/2019 at 02:42 0 comments

    I'm still trying to figure out a good method to solve the final problem of vector generation.

    Meanwhile I'm also making v2.6 more solid with a better detection of netlist warts, such as unconnected sinks and sources. I label those faulty gates or ports : "zombies". I don't make much efforts to "clean up" the netlist, instead I'll bail out if the netlist contains zombies. There is no point in trying because the vectors make sense only on a final design, however I try to make the output less cryptic and more useful when the tool is used as a simple netlist checker during design and for regression tests.

    Also : the netlist will not "see" GND and VCC gates because they have no input. They don't make sense anyway, though A3P netlist often contain these... I consider these as "NOFIX".

    A3Ptiles_v2.6_20191228.tgz has all the good stuff.

    More info can be found in the test4_cornercases directory. The weird_unit.vhdl file implements the following circuits:

    ff1 and ff2 implement a well-known set/reset flip-flop circuit and the outputs are "sequential" because they depend on the previous state, so they are "zombies". Both NAND2 are correctly flagged.

    The the mx0 implements a transparent latch. mx0 is correctly flagged, as well and the next gate that depends on the output.

    Out(4) is left unconnected.

    open1 and open2 are two chained gates with the final output dangling open. This is where things get a bit complex because open2 is correctly flagged as a zombie but not open1. Removing both would affect other parts of the system, such as reducing the depth of the design, which would create an avalanche of other effects...

    open3 is correctly flagged as a zombie because both inputs are unconnected.

    The end of the report says this:

       Latency of the 5 outputs :
          Output#0 : N/A
          Output#1 : N/A
          Output#2 : N/A
          Output#3 : N/A
          Output#4 : N/A
      Found 6 zombie gates or inputs (unconnected or loops) :
     - Gate #6 : fanout= N/A - :vectgen(plip):wrap@vg_wrapper(weird):open2@and2(trace):lut4
     - Gate #1 : fanout=2 - :vectgen(plip):wrap@vg_wrapper(weird):ff1@nand2(trace):lut4
     - Gate #2 : fanout=2 - :vectgen(plip):wrap@vg_wrapper(weird):ff2@nand2(trace):lut4
     - Gate #3 : fanout=2 - :vectgen(plip):wrap@vg_wrapper(weird):mx0@mx2(trace):lut8
     - Gate #4 : fanout=1 - :vectgen(plip):wrap@vg_wrapper(weird):dummy@and2(trace):lut4
     - Gate #7 : fanout= N/A - :vectgen(plip):wrap@vg_wrapper(weird):open3@and2(trace):lut4

    I chose to flag the errors rather than correct them (by pruning), for these reasons :

    • The correction/remedy could make the situation worse if there is a misunderstanding or a bug
    • It is not the purpose of the system : I just want to make sure the vectors are generated with a sane dataset.
    • It was not planned in advance and this "late feature" is harder to add (it was not straight-forward to implement anyway)
    • Feature creep is bad and I want to KISS.

    So I hope people can use this tool as a filter, I try to present useful information, so the user can correct their design.

  • DepthLists

    Yann Guidon / YGDES12/24/2019 at 22:16 4 comments

    v2.6 is looking good !

    I already have the gatelist which is, as the name implies, the list of gates, and their connections are working well. It is now supplemented by the "depthlist", a 2D array of gate references. It simplifies the design of algorithms that scan forward or backward in the circuit.

    Here is the new display for the INC8 unit :

      ************ DEPTHLIST ************
     - Input #0 : fanout=4
          1 : Gate #1(0) - inc8):dut@inc8(tiles):e_r0b@inv(trace):lut2
          2 : Gate #2(0) - inc8):dut@inc8(tiles):e_r1b@xor2(trace):lut4
          3 : Gate #3(0) - inc8):dut@inc8(tiles):e_r2b@ax1c(trace):lut8
          4 : Gate #4(0) - inc8):dut@inc8(tiles):e_r3a@and3(trace):lut8
     - Input #1 : fanout=3
          1 : Gate #2(1) - inc8):dut@inc8(tiles):e_r1b@xor2(trace):lut4
          2 : Gate #3(1) - inc8):dut@inc8(tiles):e_r2b@ax1c(trace):lut8
          3 : Gate #4(1) - inc8):dut@inc8(tiles):e_r3a@and3(trace):lut8
     - Input #2 : fanout=2
          1 : Gate #3(2) - inc8):dut@inc8(tiles):e_r2b@ax1c(trace):lut8
          2 : Gate #4(2) - inc8):dut@inc8(tiles):e_r3a@and3(trace):lut8
     - Input #3 : fanout=4
          1 : Gate #5(0) - inc8):dut@inc8(tiles):e_r3b@xor2(trace):lut4
          2 : Gate #6(0) - inc8):dut@inc8(tiles):e_r4a@and2(trace):lut4
          3 : Gate #7(1) - inc8):dut@inc8(tiles):e_r4b@ax1c(trace):lut8
          4 : Gate #8(0) - inc8):dut@inc8(tiles):e_r5a@and3(trace):lut8
     - Input #4 : fanout=3
          1 : Gate #6(1) - inc8):dut@inc8(tiles):e_r4a@and2(trace):lut4
          2 : Gate #7(2) - inc8):dut@inc8(tiles):e_r4b@ax1c(trace):lut8
          3 : Gate #8(1) - inc8):dut@inc8(tiles):e_r5a@and3(trace):lut8
     - Input #5 : fanout=3
          1 : Gate #9(2) - inc8):dut@inc8(tiles):e_r5b@ax1c(trace):lut8
          2 : Gate #10(1) - inc8):dut@inc8(tiles):e_r6a@and3(trace):lut8
          3 : Gate #8(2) - inc8):dut@inc8(tiles):e_r5a@and3(trace):lut8
     - Input #6 : fanout=2
          1 : Gate #11(2) - inc8):dut@inc8(tiles):e_r6b@ax1c(trace):lut8
          2 : Gate #10(2) - inc8):dut@inc8(tiles):e_r6a@and3(trace):lut8
     - Input #7 : fanout=2
          1 : Gate #12(2) - inc8):dut@inc8(tiles):e_r7a@and3(trace):lut8
          2 : Gate #13(2) - inc8):dut@inc8(tiles):e_r7b@ax1c(trace):lut8
     Depth 1 : 6 gates.
     - Gate #1 : fanout=1  Depth min=1  max=1  LUT="10" - inc8):dut@inc8(tiles):e_r0b@inv(trace):lut2
          1 : Output #0
     - Gate #2 : fanout=1  Depth min=1  max=1  LUT="0110" - inc8):dut@inc8(tiles):e_r1b@xor2(trace):lut4
          1 : Output #1
     - Gate #3 : fanout=1  Depth min=1  max=1  LUT="01010110" - inc8):dut@inc8(tiles):e_r2b@ax1c(trace):lut8
          1 : Output #2
     - Gate #4 : fanout=6  Depth min=1  max=1  LUT="00000001" - inc8):dut@inc8(tiles):e_r3a@and3(trace):lut8
          1 : Gate #5(1) - inc8):dut@inc8(tiles):e_r3b@xor2(trace):lut4
          2 : Gate #7(0) - inc8):dut@inc8(tiles):e_r4b@ax1c(trace):lut8
          3 : Gate #9(0) - inc8):dut@inc8(tiles):e_r5b@ax1c(trace):lut8
          4 : Gate #11(0) - inc8):dut@inc8(tiles):e_r6b@ax1c(trace):lut8
          5 : Gate #12(0) - inc8):dut@inc8(tiles):e_r7a@and3(trace):lut8
          6 : Gate #13(0) - inc8):dut@inc8(tiles):e_r7b@ax1c(trace):lut8
     - Gate #6 : fanout=2  Depth min=1  max=1  LUT="0001" - inc8):dut@inc8(tiles):e_r4a@and2(trace):lut4
          1 : Gate #10(0) - inc8):dut@inc8(tiles):e_r6a@and3(trace):lut8
          2 : Gate #9(1) - inc8):dut@inc8(tiles):e_r5b@ax1c(trace):lut8
     - Gate #8 : fanout=1  Depth min=1  max=1  LUT="00000001" - inc8):dut@inc8(tiles):e_r5a@and3(trace):lut8
          1 : Gate #11(1) - inc8):dut@inc8(tiles):e_r6b@ax1c(trace):lut8
     Depth 2 : 5 gates.
     - Gate #5 : fanout=1  Depth min=1  max=2  LUT="0110" - inc8):dut@inc8(tiles):e_r3b@xor2(trace):lut4
          1 : Output #3
     - Gate #7 : fanout=1  Depth min=1  max=2  LUT="01010110" - inc8):dut@inc8(tiles):e_r4b@ax1c(trace):lut8
          1 : Output #4
     - Gate #10 : fanout=2  Depth min=1  max=2  LUT="00000001" - inc8):dut@inc8(tiles):e_r6a@and3(trace):lut8
          1 : Gate #12(1) - inc8):dut@inc8(tiles):e_r7a@and3(trace):lut8
          2 : Gate #13(1) - inc8):dut@inc8(tiles):e_r7b@ax1c(trace):lut8
     - Gate #9 : fanout=1  Depth min=1  max=2  LUT="01010110" - inc8):dut@inc8(tiles):e_r5b@ax1c(trace):lut8
          1 : Output #5
     - Gate #11 : fanout=1  Depth min=1  max=2  LUT="01010110" - inc8):dut@inc8(tiles):e_r6b@ax1c(trace):lut8
          1 : Output #6
    Read more »

  • The right depth

    Yann Guidon / YGDES12/22/2019 at 04:16 0 comments

    I've redesigned the algorithm that explores/registers the depth of all the gates and outputs and the result is pretty good :

      ************ FIXING DEPTHLIST ************
     ----- Depth=1
       > registering Gate #1
       > registering Gate #2
       > registering Gate #3
       > registering Gate #4
       > registering Gate #6
       > registering Gate #8
     ----- Depth=2
     found Output #0
     found Output #1
     found Output #2
       > registering Gate #5
       > registering Gate #7
       > registering Gate #10
       > registering Gate #9
       > registering Gate #11
     ----- Depth=3
     found Output #3
     found Output #4
       > registering Gate #12
       > registering Gate #13
     found Output #5
     found Output #6
     ----- Depth=4
     found Output #8
     found Output #7
     DepthList : fixed

    The last version suffered a few small issues that became real problem when I tried to add "loop detection" (such as a flip-flop made of cross-interlocking gates).

    The new algorithm uses a different approach, where a gate is re-added to the "to-scan list" when all its inputs have been scanned already, and have a definite "depth".

    A counter for every gate is initialised with the gate's number of inputs and it is decremented each time an input is registered.

    At the end, if the counter is not zero, then gate has a missing input (or a bug).

    There is the special case of the VCC/GND gates with no input... but they shouldn't be used in ASICs, right ?

    The nice thing about the new approach is that I merged it with a new 2D gatelist that is organised with the depth, respective to the input, so it's easier to display the circuit.

  • v2.6 : the netlist generation

    Yann Guidon / YGDES12/15/2019 at 16:20 0 comments

    The lasted developments seem to be successful !

    v2.6 is progressing and I can already list not only the gates but their interconnections !

     13x A3P gates found.
     no exclusion input file to read.
      Input vector : 8 bits, Output vector : 9 bits
     Netlist : fixed
      ************ NETLIST ************
     - Input #0 : fanout=4
          1 : Gate #1(0) - inc8(tiles):e_r0b@inv(trace):lut2
          2 : Gate #2(0) - inc8(tiles):e_r1b@xor2(trace):lut4
          3 : Gate #3(0) - inc8(tiles):e_r2b@ax1c(trace):lut8
          4 : Gate #4(0) - inc8(tiles):e_r3a@and3(trace):lut8
     - Input #1 : fanout=3
          1 : Gate #2(1) - inc8(tiles):e_r1b@xor2(trace):lut4
          2 : Gate #3(1) - inc8(tiles):e_r2b@ax1c(trace):lut8
          3 : Gate #4(1) - inc8(tiles):e_r3a@and3(trace):lut8
     - Input #2 : fanout=2
          1 : Gate #3(2) - inc8(tiles):e_r2b@ax1c(trace):lut8
          2 : Gate #4(2) - inc8(tiles):e_r3a@and3(trace):lut8
     - Input #3 : fanout=4
          1 : Gate #5(0) - inc8(tiles):e_r3b@xor2(trace):lut4
          2 : Gate #6(0) - inc8(tiles):e_r4a@and2(trace):lut4
          3 : Gate #7(1) - inc8(tiles):e_r4b@ax1c(trace):lut8
          4 : Gate #8(0) - inc8(tiles):e_r5a@and3(trace):lut8
     - Input #4 : fanout=3
          1 : Gate #6(1) - inc8(tiles):e_r4a@and2(trace):lut4
          2 : Gate #7(2) - inc8(tiles):e_r4b@ax1c(trace):lut8
          3 : Gate #8(1) - inc8(tiles):e_r5a@and3(trace):lut8
     - Input #5 : fanout=3
          1 : Gate #9(2) - inc8(tiles):e_r5b@ax1c(trace):lut8
          2 : Gate #10(1) - inc8(tiles):e_r6a@and3(trace):lut8
          3 : Gate #8(2) - inc8(tiles):e_r5a@and3(trace):lut8
     - Input #6 : fanout=2
          1 : Gate #11(2) - inc8(tiles):e_r6b@ax1c(trace):lut8
          2 : Gate #10(2) - inc8(tiles):e_r6a@and3(trace):lut8
     - Input #7 : fanout=2
          1 : Gate #12(2) - inc8(tiles):e_r7a@and3(trace):lut8
          2 : Gate #13(2) - inc8(tiles):e_r7b@ax1c(trace):lut8
     - Gate #1 : fanout=1  Depth=  LUT="10" - inc8(tiles):e_r0b@inv(trace):lut2
          1 : Output #0
     - Gate #2 : fanout=1  Depth=  LUT="0110" - inc8(tiles):e_r1b@xor2(trace):lut4
          1 : Output #1
     - Gate #3 : fanout=1  Depth=  LUT="01010110" - inc8(tiles):e_r2b@ax1c(trace):lut8
          1 : Output #2
     - Gate #4 : fanout=6  Depth=  LUT="00000001" - inc8(tiles):e_r3a@and3(trace):lut8
          1 : Gate #5(1) - inc8(tiles):e_r3b@xor2(trace):lut4
          2 : Gate #7(0) - inc8(tiles):e_r4b@ax1c(trace):lut8
          3 : Gate #9(0) - inc8(tiles):e_r5b@ax1c(trace):lut8
          4 : Gate #11(0) - inc8(tiles):e_r6b@ax1c(trace):lut8
          5 : Gate #12(0) - inc8(tiles):e_r7a@and3(trace):lut8
          6 : Gate #13(0) - inc8(tiles):e_r7b@ax1c(trace):lut8
     - Gate #5 : fanout=1  Depth=  LUT="0110" - inc8(tiles):e_r3b@xor2(trace):lut4
          1 : Output #3
     - Gate #6 : fanout=2  Depth=  LUT="0001" - inc8(tiles):e_r4a@and2(trace):lut4
          1 : Gate #10(0) - inc8(tiles):e_r6a@and3(trace):lut8
          2 : Gate #9(1) - inc8(tiles):e_r5b@ax1c(trace):lut8
     - Gate #7 : fanout=1  Depth=  LUT="01010110" - inc8(tiles):e_r4b@ax1c(trace):lut8
          1 : Output #4
     - Gate #8 : fanout=1  Depth=  LUT="00000001" - inc8(tiles):e_r5a@and3(trace):lut8
          1 : Gate #11(1) - inc8(tiles):e_r6b@ax1c(trace):lut8
     - Gate #9 : fanout=1  Depth=  LUT="01010110" - inc8(tiles):e_r5b@ax1c(trace):lut8
          1 : Output #5
     - Gate #10 : fanout=2  Depth=  LUT="00000001" - inc8(tiles):e_r6a@and3(trace):lut8
          1 : Gate #12(1) - inc8(tiles):e_r7a@and3(trace):lut8
          2 : Gate #13(1) - inc8(tiles):e_r7b@ax1c(trace):lut8
     - Gate #11 : fanout=1  Depth=  LUT="01010110" - inc8(tiles):e_r6b@ax1c(trace):lut8
          1 : Output #6
     - Gate #12 : fanout=1  Depth=  LUT="00000001" - inc8(tiles):e_r7a@and3(trace):lut8
          1 : Output #8
     - Gate #13 : fanout=1  Depth=  LUT="01010110" - inc8(tiles):e_r7b@ax1c(trace):lut8
          1 : Output #7
      ************ END OF NETLIST ************

    This is the netlist extracted from the INC8 unit and it's getting better and better. It matches well with the schematic :

    I'm polishing things and I should add a few things :

    • histogram of the fanouts...
    • find a way to activate/trigger a given LUT entry
    • "compile" the "depth" and check for loops or disconnected stuff
    • ...

    The latest update has a few enhancements and the ALU8 passes the netlist extractor :-)

  • Hierarchy problems and solutions

    Yann Guidon / YGDES12/08/2019 at 04:30 1 comment

    VHDL is a crazy rich language but with crazy idiosyncrasies... It tries to enforce "good practices" by promoting certain constructs and banning others, which can make your life harder sometimes.

    Here I want to speak about the crown jewels of the library : a system that takes arbitrary logic/boolean circuits (implemented with this very library only), extracts the netlist and generates a small set of test vectors.

    This is not really a "black box" approach because we have the source code but I don't want to even have to consider analysing it, this would mean digging into GHDL-specific features and a long-term risk. Thanks to this library, I can use a "grey box" approach because I can access the inputs and outputs. Somehow. It's not a panacea but enough to get us going in the right direction.

    The early idea looks like this :

    We have the D.U.T. integrated in the VG program/entity through a wrapper that transforms the bunch of wires into a couple of bland std_logic_vectors. Our vector algorithm won't care a bit about what's inside or how to interface to it, it's all just bits to read and write...

    There is just one little big problem: the number of input and output bits is usually given by a generic parameter/number, but here it is provided "from the inside out" (or bottom-up) by the DUT/Wrapper, while VHDL "promotes" the reverse : generics enforce the top-bottom hierarchy and are provided by the top-level entity. Which can't guess in advance what's inside...

    One could use configurations or even external generics but I want to keep the whole thing as lean  and easy to use as possible. Ideally, the Wrapper would be generated automatically though at this stage, it's much faster and easier to do it by hand. Later I'll find how GHDL can help, as Tristan told me.

    One natural solution is to change the hierarchy.

    Now the wrapper encapsulates the whole thing, instead of being a mere translator/connector. A tiny advantage is that the DUT gets one level higher in the hierarchy, which will shorten the logs (a bit). There are two small wrinkles though:

    • The wrapper should be as lean as possible, and easily computer-generated. If the VG is integrated, it adds complexity and any change in its interface (for control and reporting for example) will force a redesign of the wrapper and the wrapper generator...
    • The order of inclusion matters. A lot. The wrapper has the burden of preserving it.

    I return back to the first hierarchy with a twist : I let the system auto-configure itself through some simple tricks....

    This system leaves the wrapper (and its generator) free from any consideration about the top level, by just routing a few wires here and there. The generics go in the right direction now and the vector generator could implement as many inputs and outputs as desired, and even more. The initial phase simply loops over the in and out vectors to determine the number of used bits, it doesn't take much time anyway, before it does the rest of the useful work.

    Sounds like a good plan.

    And here is the full source code for the wrapper of INC8 :
    -- A3Pv2.6/test2_INC8/Wrap_INC8.vhdl
    -- version dim. déc.  8 08:07:34 CET 2019 : forked from INC8_tb.vhdl
    -- Released under the GNU AGPLv3 license
    Library ieee;
        use ieee.std_logic_1164.all;
    Library work;
        use work.all;
    entity VG_Wrapper is
      generic (
        VectGenWidthIn : integer :=  9;
        VectGenWidthOut: integer := 10  
        VectIn : in  std_logic_vector(VectGenWidthIn -1 downto 0);
        VectOut: out std_logic_vector(VectGenWidthOut-1 downto 0);
        VectClk: in std_logic
    end VG_Wrapper;
    architecture Wrap_INC8 of VG_Wrapper is
      dut: entity INC8 port map (
         --	here we	"wire" the unit to the Vector Generator ports:
         A => VectIn(7 downto 0),
         Y => VectOut(7 downto 0),
         V => VectOut(8)
      -- the wires for "autoconfig":
      VectOut(9) <= '1';
      VectOut(VectGenWidthOut-1) <=	VectIn(8);
    end Wrap_INC8;

     That's it !

    As the name says, it's just a wrapper so nothing...

    Read more »

  • Winner scanner

    Yann Guidon / YGDES12/03/2019 at 04:14 10 comments

    The above picture shows the scanning pattern of the enhanced testbench for INC8. Not that it makes a big difference, since the scan is quite fast (1s on my i7) but for the ALU8, which lasts a few minutes at this moment, the 54% time saved will mean quite a lot...

    It works well for the INC8 and the ALU because they rely on carry propagation (of some sort). The algorithm uses a dual loop (outer forward, inner backwards) that "hits" the powers of 2 sooner than a simple linear scan : the index 128 will be reached after 64 iterations, for example. Many "failure modes" appear on powers of two, or the index before (like : 127 and 128) so reaching them faster is good. This results in the "inverse sawtooth" pattern of the above picture.

    This is boosted by another trick called "folding" that tests an index and its opposite. This creates the "horizontal mirror" of the picture. The resulting algorithm is a bit subtle but efficient and small: 

    -- 1743 cycles vs 3995 in linear mode !
    procedure reverse_folding is
      variable j : integer := 1;  -- the current power of 2
      variable k : integer := 0;  -- the inferior limit for the reverse scan
      variable l : integer := 0;  -- the sub-loop counter for reverse scan
        l := j;
          if (l < 128) then -- 128 appears 2x
            test_cycle(    l);
          end if;
          l := l-1;
          exit when l < k;
        end loop;
        k := j+1;
        j := j+j;
        exit when j > 128;
      end loop;
    end reverse_folding;

    We'll see soon enough if this cuts the run time of the ALU8 tests !

    Well, guess what ?

    For the thorough testing of the ALU8,

    • 13.248.331 simulation cycles in 383s to check all the faults with linear scanning
    • 1.121.723 cycles in 36s with reverse-folding !

    so it's roughly a 10x increase in processing efficiency !

    Upload: soon

    20191205 :

    down to 34s and only 931.316 cycles with this dumb simple tweak : I swapped the SRI and SND ports !

    I don't know how but I'll have to try some bit shufflings. However the search space is out of range : 16! = 20.922.789.888.000...

View all 33 project logs

  • 1
    Get and install GHDL

    get it there :

    also read

    Be sure to not use the MCODE version... The GCC and LLVM backends seem to work well.

  • 2

    under Linux :

    tar xzvf A3Ptiles_v2.6_20191229.tgz
    cd A3Pv2.6

    This will run scripts for a minute or so, compiling everything and testing the results for consistency.

  • 3
    Use in your project

    Copy (or symlink) the directory proasic3v2 in your project.

    Use the tests/examples to setup your scripts and include the proper paths to the files.

View all 3 instructions

Enjoy this project?



Yann Guidon / YGDES wrote 07/27/2020 at 06:05 point

Another important fix (makes the latches work as intended) : update to YGREC8_VHDL.20200727.tgz !

  Are you sure? yes | no

Yann Guidon / YGDES wrote 07/17/2020 at 03:16 point

TODO for v2.10 : consider DFFs as inputs and outputs...

  Are you sure? yes | no

Tim wrote 07/12/2020 at 19:58 point

Looks exciting! Still very incomplete as of now, not even spice models of their nmos/pmos.

  Are you sure? yes | no

Yann Guidon / YGDES wrote 07/12/2020 at 21:54 point

yes but it's due soon !

  Are you sure? yes | no

Tim wrote 07/12/2020 at 22:37 point

I hope this will also help improve all the OSS VLSI-Design toolchains. Magic is old.

The discussion about "130 nm is an ancient technology why not something more recent" are quite funny. Almost all of this "maker stuff" revolves around designs in 130 nm (afaik STM32F103) or much larger (AVR), not to speak of all the sensors, PMICs and LED-drivers in 350 nm or 180 nm.

I would be perfectly happy with reliable and low cost access to 350nm. Still not too sure about the conditions surrounding the skywater offer.

  Are you sure? yes | no

Yann Guidon / YGDES wrote 07/12/2020 at 23:59 point

I agree about 130nm : it's a great node for "us". decent Vcore, good I/O voltages, fast, dense, and this fab supports Flash cells.
I'm discussing with the  @Tim Ansell about the conditions, it's very welcoming !

  Are you sure? yes | no

Tim Ansell wrote 07/13/2020 at 00:18 point

The exact conditions are still being decided but the generally idea is "free as in beer" for "free as in freedom".

You can have a look over Google's documentation about licenses for open source projects at for some idea of what is going to be suitable.

If you don't have a strong preference, just use Google's default license choice of Apache 2.0 -- From

> The Apache license is our preferred license not only for source code  but also for hardware."

Stay well away from the licenses listed under

  Are you sure? yes | no

Tim wrote 07/13/2020 at 06:42 point

Cool! Having no restriction on the PDK and derived products would already be a huge anomaly/change.

I was also wondering about the criteria to be able to use their manufacturing service. They mentioned it would be free for eligible designs (?). I guess there ha ve to be some limits.

Looking at current MPW costs at 130 nm, this is still off by one or two orders of magnitude for a noncommercial project without external funding. So they must do something to really bring cost down or put up a high barrier.

  Are you sure? yes | no

Yann Guidon / YGDES wrote 07/13/2020 at 09:01 point

Hi @Tim Ansell  :-)
Thanks for the explanations and details !

I looked at the links about the licences and find that AGPL is considered "toxic" to your employer, though it makes perfect sense in the current situation, because we're not building a webservice for example. We all agree that the purpose is to totally open as many levels of the design pipeline as possible and AGPL unambiguously goes in that direction. We all want to have the full "source code" of the chips we have/own to ensure safety and security.

This gates library as well as most of my "FOSS" projects are under AGPL.

  Are you sure? yes | no

Tim wrote 07/13/2020 at 10:41 point

Oh, I did not realise Tim Ansell was directly involved.

  Are you sure? yes | no

Tim Ansell wrote 07/13/2020 at 16:39 point

@Yann Guidon / YGDES - I'm not a lawyer and so won't argue about reasons around the licenses.

The lawyers have said we can *not* accept AGPL files. Projects under AGPL won't be eligible for inclusion in the shuttle run.

  Are you sure? yes | no

Yann Guidon / YGDES wrote 08/08/2019 at 10:36 point

Damnit I found a wrong declaration for NAND2, with 3 inputs in PA3_components.vhdl :-/

  Are you sure? yes | no

Yann Guidon / YGDES wrote 12/02/2018 at 19:24 point


so far, I need the following gates for INC8 :


I haven't looked yet at your library of CMOS cells at #Shared Silicon ...

  Are you sure? yes | no

SHAOS wrote 12/02/2018 at 23:54 point

What is "AX1"? My INV, AND2 and AND3 were "silicon proven"

I have XOR2 and NXOR2 too, but I didn't test them in silicon yet

  Are you sure? yes | no

Yann Guidon / YGDES wrote 12/03/2018 at 04:31 point

From the files :

architecture rtl of AX1 is
  Y <= (A and B) xor C after gate_delay;
end rtl;

so it's a XOR2 with one input being the result of AND2. The absence of output buffer of AND2 makes the combined gate faster.

  Are you sure? yes | no

SHAOS wrote 12/03/2018 at 04:34 point

so I guess it could be built out of AND2 and XOR2 then :)

  Are you sure? yes | no

Yann Guidon / YGDES wrote 12/03/2018 at 05:11 point

It could but it would use more surface and be slower.

That's why it's interesting to design and analyse the logic with A3P : if you limit yourself to basic gates, you miss some useful optimisations.

This library of cells helps me design tighter circuits and when they are stable and optimised, I can then focus on the CMOS library to implement the required functions (if they are not available yet).

And your role becomes clear at this point ;-)

  Are you sure? yes | no

SHAOS wrote 12/03/2018 at 06:32 point

I don't think custom implementation of AND-XOR will be much smaller or faster of a simple combination of AND and XOR, may be just a little bit... 

  Are you sure? yes | no

Yann Guidon / YGDES wrote 12/03/2018 at 12:03 point

This little bit could make a significant difference :-)

  Are you sure? yes | no

Yann Guidon / YGDES wrote 12/03/2018 at 12:31 point

the merging saves the space of the intermediate buffer, as well as the interconnect between AND2 and XOR2.

  Are you sure? yes | no

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates