Close
0%
0%

VHDL library for gate-level verification

Hard macros and tiles of Actel/Microsemi's ProASIC3 so I can design and verify optimised code without the proprietary libraries

Similar projects worth following
This is a collection of VHDL files that re-implement the gates of a FPGA family, which is close to ASIC structures and that I use to evaluate ASICs. It works for 4-input LUTs even though it started with Actel/Microsemi/Microchip's official definitions (LUT-3 gates). I don't address exact delays for very accurate simulation because they are proprietary and I don't need such a precision at the early stages of prototyping.

This library replaces the proprietary files during simulation of synthesised results, so I can make truly Free and Open Source designs, which can later be ported to ASIC technology. This is still a work in progress but the latest feature provide useful alternatives to the proprietary tools: I added custom extensions to provide incredibly useful verification features such as logical cone checks, BIST coverage checks, and I currently work on BIST vector generation.

The VHDL code is portable '93 but only GHDL scripts (not mcode) are provided.

This project is "more or less related" to  #Shared Silicon and aims to better prepare and prototype designs in FPGA before committing to ASIC tapeout. I'm currently using this library for the #YGREC8 to check design sanity and testability (which is also why you'll find many references and files from Y8 here).

Lately, Christos joined forces and opened new perspectives and applications for this library, for test/validation/fuzzing through the injection of faults. It enables the verification of test benches and other critical tools, for fault-tolerant circuits, wafer-stepper test vectors...

Currently I'm working on the 2nd version of the library, a deeply refactored and enhanced version, and aiming at a v3 with most desired features in the near future.


The library implements gates that can work in one of several modes, as explained in the log Modes of operation.

What can you do now with this ?

  • Simulate your mapped design (the initial purpose). The "simple" implementation ( >= v2.8 ) works anywhere, it's fast but without any fancy feature.
  • Check that the gates behave correctly, with histograms of their activity, and detect unused cases (this helps optimise/simplify a design, or ensure it is optimal)
  • Inject unusual signals at inputs to observe the logical cone
  • Alter the function of a given gate (simulate a hardware fault)
  • Compare which alternative architecture toggles the least nets, investigate toggle-reduction strategies to reduce switching-induced power consumption.
  • Implement arbitrary logic with the generic gates (arbitrary LUT16 is supported since 20191213, so the library is not restricted to ProASIC3 designs)

What's intended ? (not yet implemented)

  • BIST verification (brute-force through all the gates is possible thanks to parallelism)
  • Automatic Test Vector Generation
  • Import/Export EDIF ? CircuitJS ?

It all started as a Free collection of 3-input gates and some additional ProASIC3-specific modules, used to design my own systems.

See the license/ directory for the AGPLv3 terms of distribution.

The proasic3v3/ directory contains all the gates and modules, rewritten to simulate the real tiles and hard macros.

You'll find a series of examples in the testxxxxxx directories, that implement various 8-bits units using only 3-inputs gates and with a well-defined latency.


As of 20190819, the v2 supports these tiles and hard macros:

No input:
GND VCC

1 input:
BUFD   BUFF   CLKINT INV    INVD

2 inputs:
and2    and2a   and2b   nand2   nand2a  nand2b  nor2    nor2a   nor2b   or2     or2a    or2b    xnor2   xor2

3 inputs:
and3    and3a   and3b   and3c   ao1     ao12    ao13    ao14    ao15    ao16    ao17    ao18    ao1a    ao1b    ao1c    ao1d    ao1e    aoi1    aoi1a   aoi1b   aoi1c   aoi1d   aoi5    ax1     ax1a    ax1b    ax1c    ax1d    ax1e    axo1    axo2    axo3    axo5    axo6    axo7    axoi1   axoi2   axoi3   axoi4   axoi5   axoi7   maj3    maj3x   maj3xi  min3    min3x   min3xi  mx2     mx2a    mx2b    mx2c    nand3   nand3a  nand3b  nor3    nor3a   nor3b   nor3c   oa1    ...

Read more »

A3Ptiles_v2.8_20200324.tgz

Version 2.8 can generate the "simple" definitions.

x-compressed-tar - 163.07 kB - 03/24/2020 at 17:22

Download

PA3_definitions_simple_nodelay.vhdl

v2.8 Library "simple" implementation with direct definition and NO external dependency or analysis capability, for compatibility. No delay

x-vhdl - 76.33 kB - 03/24/2020 at 17:18

Download

A3Ptiles_v2.6_20191229.tgz

Last release of 2019, easier to integrate in other projects.

x-compressed-tar - 160.00 kB - 12/29/2019 at 20:11

Download

A3Ptiles_v2.6_20191228.tgz

Added "zombie" gates and ports to solidify v2.6

x-compressed-tar - 160.54 kB - 12/28/2019 at 02:31

Download

A3Ptiles_v2.6_20191225.tgz

added test4_cornercases

x-compressed-tar - 156.42 kB - 12/25/2019 at 23:00

Download

View all 33 files

  • A simple option

    Yann Guidon / YGDES03/24/2020 at 17:29 0 comments

    I think I skipped v2.7 that was on the drawing board (I still need to make more examples), but v2.8 is here !

    A3Ptiles_v2.8_20200324.tgz

    It adds (or restores) a feature/behaviour that was considered in the beginning, then abandoned, then lately I thought it would be cool

    1. to be able to simulate the designs at a slightly faster speed (though it's subjective)
    2. to be able to synthesise the designs on different platforms that don't understand/know Actel's legacy conventions.

    The last point convinced me so here it is : PA3_definitions_simple_nodelay.vhdl

    I copy-pasted the code that generates the LUTs. I had to modify the code of the MX2x gates and this also introduces a timing inconsistency but it's minor and not used (since it's apparent only when timing is used AND I'm NANDifying my code).

    This version doesn't use external definitions that are necessary for analysis and introspection. It's purely so the Actelified source code can work on Lattice/Xilinx/Intel/etc.

    The default version remains the "trace" one, and it can be regenerated at will, with the timing you want, since I have provided 2 scripts and the generator can be rerun as you like.

  • v2.6 release

    Yann Guidon / YGDES12/29/2019 at 20:24 0 comments

    It's the last update of this project for 2019 !

    A3Ptiles_v2.6_20191229.tgz

    I have reorganised the files and directories a bit... More needs to be done but it's satisfying so far and I need to go back to work on the ALU8 unit. It was a good opportunity to test that the library integrates well with other projects so now I'm back to the #YGREC8project for a bit.

    See you in the next decade !

  • Zombies in v2.6

    Yann Guidon / YGDES12/28/2019 at 02:42 0 comments

    I'm still trying to figure out a good method to solve the final problem of vector generation.

    Meanwhile I'm also making v2.6 more solid with a better detection of netlist warts, such as unconnected sinks and sources. I label those faulty gates or ports : "zombies". I don't make much efforts to "clean up" the netlist, instead I'll bail out if the netlist contains zombies. There is no point in trying because the vectors make sense only on a final design, however I try to make the output less cryptic and more useful when the tool is used as a simple netlist checker during design and for regression tests.

    Also : the netlist will not "see" GND and VCC gates because they have no input. They don't make sense anyway, though A3P netlist often contain these... I consider these as "NOFIX".

    A3Ptiles_v2.6_20191228.tgz has all the good stuff.


    More info can be found in the test4_cornercases directory. The weird_unit.vhdl file implements the following circuits:

    ff1 and ff2 implement a well-known set/reset flip-flop circuit and the outputs are "sequential" because they depend on the previous state, so they are "zombies". Both NAND2 are correctly flagged.

    The the mx0 implements a transparent latch. mx0 is correctly flagged, as well and the next gate that depends on the output.

    Out(4) is left unconnected.

    open1 and open2 are two chained gates with the final output dangling open. This is where things get a bit complex because open2 is correctly flagged as a zombie but not open1. Removing both would affect other parts of the system, such as reducing the depth of the design, which would create an avalanche of other effects...

    open3 is correctly flagged as a zombie because both inputs are unconnected.

    The end of the report says this:

       Latency of the 5 outputs :
          Output#0 : N/A
          Output#1 : N/A
          Output#2 : N/A
          Output#3 : N/A
          Output#4 : N/A
     
      Found 6 zombie gates or inputs (unconnected or loops) :
     - Gate #6 : fanout= N/A - :vectgen(plip):wrap@vg_wrapper(weird):open2@and2(trace):lut4
     - Gate #1 : fanout=2 - :vectgen(plip):wrap@vg_wrapper(weird):ff1@nand2(trace):lut4
     - Gate #2 : fanout=2 - :vectgen(plip):wrap@vg_wrapper(weird):ff2@nand2(trace):lut4
     - Gate #3 : fanout=2 - :vectgen(plip):wrap@vg_wrapper(weird):mx0@mx2(trace):lut8
     - Gate #4 : fanout=1 - :vectgen(plip):wrap@vg_wrapper(weird):dummy@and2(trace):lut4
     - Gate #7 : fanout= N/A - :vectgen(plip):wrap@vg_wrapper(weird):open3@and2(trace):lut4
    

    I chose to flag the errors rather than correct them (by pruning), for these reasons :

    • The correction/remedy could make the situation worse if there is a misunderstanding or a bug
    • It is not the purpose of the system : I just want to make sure the vectors are generated with a sane dataset.
    • It was not planned in advance and this "late feature" is harder to add (it was not straight-forward to implement anyway)
    • Feature creep is bad and I want to KISS.

    So I hope people can use this tool as a filter, I try to present useful information, so the user can correct their design.

  • DepthLists

    Yann Guidon / YGDES12/24/2019 at 22:16 4 comments

    v2.6 is looking good !

    I already have the gatelist which is, as the name implies, the list of gates, and their connections are working well. It is now supplemented by the "depthlist", a 2D array of gate references. It simplifies the design of algorithms that scan forward or backward in the circuit.

    Here is the new display for the INC8 unit :

      ************ DEPTHLIST ************
      
     - Input #0 : fanout=4
          1 : Gate #1(0) - inc8):dut@inc8(tiles):e_r0b@inv(trace):lut2
          2 : Gate #2(0) - inc8):dut@inc8(tiles):e_r1b@xor2(trace):lut4
          3 : Gate #3(0) - inc8):dut@inc8(tiles):e_r2b@ax1c(trace):lut8
          4 : Gate #4(0) - inc8):dut@inc8(tiles):e_r3a@and3(trace):lut8
      
     - Input #1 : fanout=3
          1 : Gate #2(1) - inc8):dut@inc8(tiles):e_r1b@xor2(trace):lut4
          2 : Gate #3(1) - inc8):dut@inc8(tiles):e_r2b@ax1c(trace):lut8
          3 : Gate #4(1) - inc8):dut@inc8(tiles):e_r3a@and3(trace):lut8
      
     - Input #2 : fanout=2
          1 : Gate #3(2) - inc8):dut@inc8(tiles):e_r2b@ax1c(trace):lut8
          2 : Gate #4(2) - inc8):dut@inc8(tiles):e_r3a@and3(trace):lut8
      
     - Input #3 : fanout=4
          1 : Gate #5(0) - inc8):dut@inc8(tiles):e_r3b@xor2(trace):lut4
          2 : Gate #6(0) - inc8):dut@inc8(tiles):e_r4a@and2(trace):lut4
          3 : Gate #7(1) - inc8):dut@inc8(tiles):e_r4b@ax1c(trace):lut8
          4 : Gate #8(0) - inc8):dut@inc8(tiles):e_r5a@and3(trace):lut8
      
     - Input #4 : fanout=3
          1 : Gate #6(1) - inc8):dut@inc8(tiles):e_r4a@and2(trace):lut4
          2 : Gate #7(2) - inc8):dut@inc8(tiles):e_r4b@ax1c(trace):lut8
          3 : Gate #8(1) - inc8):dut@inc8(tiles):e_r5a@and3(trace):lut8
      
     - Input #5 : fanout=3
          1 : Gate #9(2) - inc8):dut@inc8(tiles):e_r5b@ax1c(trace):lut8
          2 : Gate #10(1) - inc8):dut@inc8(tiles):e_r6a@and3(trace):lut8
          3 : Gate #8(2) - inc8):dut@inc8(tiles):e_r5a@and3(trace):lut8
      
     - Input #6 : fanout=2
          1 : Gate #11(2) - inc8):dut@inc8(tiles):e_r6b@ax1c(trace):lut8
          2 : Gate #10(2) - inc8):dut@inc8(tiles):e_r6a@and3(trace):lut8
      
     - Input #7 : fanout=2
          1 : Gate #12(2) - inc8):dut@inc8(tiles):e_r7a@and3(trace):lut8
          2 : Gate #13(2) - inc8):dut@inc8(tiles):e_r7b@ax1c(trace):lut8
     
     Depth 1 : 6 gates.
     - Gate #1 : fanout=1  Depth min=1  max=1  LUT="10" - inc8):dut@inc8(tiles):e_r0b@inv(trace):lut2
          1 : Output #0
     - Gate #2 : fanout=1  Depth min=1  max=1  LUT="0110" - inc8):dut@inc8(tiles):e_r1b@xor2(trace):lut4
          1 : Output #1
     - Gate #3 : fanout=1  Depth min=1  max=1  LUT="01010110" - inc8):dut@inc8(tiles):e_r2b@ax1c(trace):lut8
          1 : Output #2
     - Gate #4 : fanout=6  Depth min=1  max=1  LUT="00000001" - inc8):dut@inc8(tiles):e_r3a@and3(trace):lut8
          1 : Gate #5(1) - inc8):dut@inc8(tiles):e_r3b@xor2(trace):lut4
          2 : Gate #7(0) - inc8):dut@inc8(tiles):e_r4b@ax1c(trace):lut8
          3 : Gate #9(0) - inc8):dut@inc8(tiles):e_r5b@ax1c(trace):lut8
          4 : Gate #11(0) - inc8):dut@inc8(tiles):e_r6b@ax1c(trace):lut8
          5 : Gate #12(0) - inc8):dut@inc8(tiles):e_r7a@and3(trace):lut8
          6 : Gate #13(0) - inc8):dut@inc8(tiles):e_r7b@ax1c(trace):lut8
     - Gate #6 : fanout=2  Depth min=1  max=1  LUT="0001" - inc8):dut@inc8(tiles):e_r4a@and2(trace):lut4
          1 : Gate #10(0) - inc8):dut@inc8(tiles):e_r6a@and3(trace):lut8
          2 : Gate #9(1) - inc8):dut@inc8(tiles):e_r5b@ax1c(trace):lut8
     - Gate #8 : fanout=1  Depth min=1  max=1  LUT="00000001" - inc8):dut@inc8(tiles):e_r5a@and3(trace):lut8
          1 : Gate #11(1) - inc8):dut@inc8(tiles):e_r6b@ax1c(trace):lut8
     
     Depth 2 : 5 gates.
     - Gate #5 : fanout=1  Depth min=1  max=2  LUT="0110" - inc8):dut@inc8(tiles):e_r3b@xor2(trace):lut4
          1 : Output #3
     - Gate #7 : fanout=1  Depth min=1  max=2  LUT="01010110" - inc8):dut@inc8(tiles):e_r4b@ax1c(trace):lut8
          1 : Output #4
     - Gate #10 : fanout=2  Depth min=1  max=2  LUT="00000001" - inc8):dut@inc8(tiles):e_r6a@and3(trace):lut8
          1 : Gate #12(1) - inc8):dut@inc8(tiles):e_r7a@and3(trace):lut8
          2 : Gate #13(1) - inc8):dut@inc8(tiles):e_r7b@ax1c(trace):lut8
     - Gate #9 : fanout=1  Depth min=1  max=2  LUT="01010110" - inc8):dut@inc8(tiles):e_r5b@ax1c(trace):lut8
          1 : Output #5
     - Gate #11 : fanout=1  Depth min=1  max=2  LUT="01010110" - inc8):dut@inc8(tiles):e_r6b@ax1c(trace):lut8
          1 : Output #6
     
     Depth...
    Read more »

  • The right depth

    Yann Guidon / YGDES12/22/2019 at 04:16 0 comments

    I've redesigned the algorithm that explores/registers the depth of all the gates and outputs and the result is pretty good :

      ************ FIXING DEPTHLIST ************
     ----- Depth=1
       > registering Gate #1
       > registering Gate #2
       > registering Gate #3
       > registering Gate #4
       > registering Gate #6
       > registering Gate #8
     ----- Depth=2
     found Output #0
     found Output #1
     found Output #2
       > registering Gate #5
       > registering Gate #7
       > registering Gate #10
       > registering Gate #9
       > registering Gate #11
     ----- Depth=3
     found Output #3
     found Output #4
       > registering Gate #12
       > registering Gate #13
     found Output #5
     found Output #6
     ----- Depth=4
     found Output #8
     found Output #7
     DepthList : fixed
    

    The last version suffered a few small issues that became real problem when I tried to add "loop detection" (such as a flip-flop made of cross-interlocking gates).

    The new algorithm uses a different approach, where a gate is re-added to the "to-scan list" when all its inputs have been scanned already, and have a definite "depth".

    A counter for every gate is initialised with the gate's number of inputs and it is decremented each time an input is registered.

    At the end, if the counter is not zero, then gate has a missing input (or a bug).

    There is the special case of the VCC/GND gates with no input... but they shouldn't be used in ASICs, right ?

    The nice thing about the new approach is that I merged it with a new 2D gatelist that is organised with the depth, respective to the input, so it's easier to display the circuit.

  • v2.6 : the netlist generation

    Yann Guidon / YGDES12/15/2019 at 16:20 0 comments

    The lasted developments seem to be successful !

    v2.6 is progressing and I can already list not only the gates but their interconnections !

     13x A3P gates found.
     no exclusion input file to read.
      Input vector : 8 bits, Output vector : 9 bits
     Netlist : fixed
      ************ NETLIST ************
     - Input #0 : fanout=4
          1 : Gate #1(0) - inc8(tiles):e_r0b@inv(trace):lut2
          2 : Gate #2(0) - inc8(tiles):e_r1b@xor2(trace):lut4
          3 : Gate #3(0) - inc8(tiles):e_r2b@ax1c(trace):lut8
          4 : Gate #4(0) - inc8(tiles):e_r3a@and3(trace):lut8
     - Input #1 : fanout=3
          1 : Gate #2(1) - inc8(tiles):e_r1b@xor2(trace):lut4
          2 : Gate #3(1) - inc8(tiles):e_r2b@ax1c(trace):lut8
          3 : Gate #4(1) - inc8(tiles):e_r3a@and3(trace):lut8
     - Input #2 : fanout=2
          1 : Gate #3(2) - inc8(tiles):e_r2b@ax1c(trace):lut8
          2 : Gate #4(2) - inc8(tiles):e_r3a@and3(trace):lut8
     - Input #3 : fanout=4
          1 : Gate #5(0) - inc8(tiles):e_r3b@xor2(trace):lut4
          2 : Gate #6(0) - inc8(tiles):e_r4a@and2(trace):lut4
          3 : Gate #7(1) - inc8(tiles):e_r4b@ax1c(trace):lut8
          4 : Gate #8(0) - inc8(tiles):e_r5a@and3(trace):lut8
     - Input #4 : fanout=3
          1 : Gate #6(1) - inc8(tiles):e_r4a@and2(trace):lut4
          2 : Gate #7(2) - inc8(tiles):e_r4b@ax1c(trace):lut8
          3 : Gate #8(1) - inc8(tiles):e_r5a@and3(trace):lut8
     - Input #5 : fanout=3
          1 : Gate #9(2) - inc8(tiles):e_r5b@ax1c(trace):lut8
          2 : Gate #10(1) - inc8(tiles):e_r6a@and3(trace):lut8
          3 : Gate #8(2) - inc8(tiles):e_r5a@and3(trace):lut8
     - Input #6 : fanout=2
          1 : Gate #11(2) - inc8(tiles):e_r6b@ax1c(trace):lut8
          2 : Gate #10(2) - inc8(tiles):e_r6a@and3(trace):lut8
     - Input #7 : fanout=2
          1 : Gate #12(2) - inc8(tiles):e_r7a@and3(trace):lut8
          2 : Gate #13(2) - inc8(tiles):e_r7b@ax1c(trace):lut8
     - Gate #1 : fanout=1  Depth=  LUT="10" - inc8(tiles):e_r0b@inv(trace):lut2
          1 : Output #0
     - Gate #2 : fanout=1  Depth=  LUT="0110" - inc8(tiles):e_r1b@xor2(trace):lut4
          1 : Output #1
     - Gate #3 : fanout=1  Depth=  LUT="01010110" - inc8(tiles):e_r2b@ax1c(trace):lut8
          1 : Output #2
     - Gate #4 : fanout=6  Depth=  LUT="00000001" - inc8(tiles):e_r3a@and3(trace):lut8
          1 : Gate #5(1) - inc8(tiles):e_r3b@xor2(trace):lut4
          2 : Gate #7(0) - inc8(tiles):e_r4b@ax1c(trace):lut8
          3 : Gate #9(0) - inc8(tiles):e_r5b@ax1c(trace):lut8
          4 : Gate #11(0) - inc8(tiles):e_r6b@ax1c(trace):lut8
          5 : Gate #12(0) - inc8(tiles):e_r7a@and3(trace):lut8
          6 : Gate #13(0) - inc8(tiles):e_r7b@ax1c(trace):lut8
     - Gate #5 : fanout=1  Depth=  LUT="0110" - inc8(tiles):e_r3b@xor2(trace):lut4
          1 : Output #3
     - Gate #6 : fanout=2  Depth=  LUT="0001" - inc8(tiles):e_r4a@and2(trace):lut4
          1 : Gate #10(0) - inc8(tiles):e_r6a@and3(trace):lut8
          2 : Gate #9(1) - inc8(tiles):e_r5b@ax1c(trace):lut8
     - Gate #7 : fanout=1  Depth=  LUT="01010110" - inc8(tiles):e_r4b@ax1c(trace):lut8
          1 : Output #4
     - Gate #8 : fanout=1  Depth=  LUT="00000001" - inc8(tiles):e_r5a@and3(trace):lut8
          1 : Gate #11(1) - inc8(tiles):e_r6b@ax1c(trace):lut8
     - Gate #9 : fanout=1  Depth=  LUT="01010110" - inc8(tiles):e_r5b@ax1c(trace):lut8
          1 : Output #5
     - Gate #10 : fanout=2  Depth=  LUT="00000001" - inc8(tiles):e_r6a@and3(trace):lut8
          1 : Gate #12(1) - inc8(tiles):e_r7a@and3(trace):lut8
          2 : Gate #13(1) - inc8(tiles):e_r7b@ax1c(trace):lut8
     - Gate #11 : fanout=1  Depth=  LUT="01010110" - inc8(tiles):e_r6b@ax1c(trace):lut8
          1 : Output #6
     - Gate #12 : fanout=1  Depth=  LUT="00000001" - inc8(tiles):e_r7a@and3(trace):lut8
          1 : Output #8
     - Gate #13 : fanout=1  Depth=  LUT="01010110" - inc8(tiles):e_r7b@ax1c(trace):lut8
          1 : Output #7
      ************ END OF NETLIST ************

    This is the netlist extracted from the INC8 unit and it's getting better and better. It matches well with the schematic :

    I'm polishing things and I should add a few things :

    • histogram of the fanouts...
    • find a way to activate/trigger a given LUT entry
    • "compile" the "depth" and check for loops or disconnected stuff
    • ...

    The latest update has a few enhancements and the ALU8 passes the netlist extractor :-)

  • Hierarchy problems and solutions

    Yann Guidon / YGDES12/08/2019 at 04:30 0 comments

    VHDL is a crazy rich language but with crazy idiosyncrasies... It tries to enforce "good practices" by promoting certain constructs and banning others, which can make your life harder sometimes.

    Here I want to speak about the crown jewels of the library : a system that takes arbitrary logic/boolean circuits (implemented with this very library only), extracts the netlist and generates a small set of test vectors.

    This is not really a "black box" approach because we have the source code but I don't want to even have to consider analysing it, this would mean digging into GHDL-specific features and a long-term risk. Thanks to this library, I can use a "grey box" approach because I can access the inputs and outputs. Somehow. It's not a panacea but enough to get us going in the right direction.

    The early idea looks like this :

    We have the D.U.T. integrated in the VG program/entity through a wrapper that transforms the bunch of wires into a couple of bland std_logic_vectors. Our vector algorithm won't care a bit about what's inside or how to interface to it, it's all just bits to read and write...

    There is just one little big problem: the number of input and output bits is usually given by a generic parameter/number, but here it is provided "from the inside out" (or bottom-up) by the DUT/Wrapper, while VHDL "promotes" the reverse : generics enforce the top-bottom hierarchy and are provided by the top-level entity. Which can't guess in advance what's inside...

    One could use configurations or even external generics but I want to keep the whole thing as lean  and easy to use as possible. Ideally, the Wrapper would be generated automatically though at this stage, it's much faster and easier to do it by hand. Later I'll find how GHDL can help, as Tristan told me.


    One natural solution is to change the hierarchy.

    Now the wrapper encapsulates the whole thing, instead of being a mere translator/connector. A tiny advantage is that the DUT gets one level higher in the hierarchy, which will shorten the logs (a bit). There are two small wrinkles though:

    • The wrapper should be as lean as possible, and easily computer-generated. If the VG is integrated, it adds complexity and any change in its interface (for control and reporting for example) will force a redesign of the wrapper and the wrapper generator...
    • The order of inclusion matters. A lot. The wrapper has the burden of preserving it.

    I return back to the first hierarchy with a twist : I let the system auto-configure itself through some simple tricks....

    This system leaves the wrapper (and its generator) free from any consideration about the top level, by just routing a few wires here and there. The generics go in the right direction now and the vector generator could implement as many inputs and outputs as desired, and even more. The initial phase simply loops over the in and out vectors to determine the number of used bits, it doesn't take much time anyway, before it does the rest of the useful work.

    Sounds like a good plan.


    And here is the full source code for the wrapper of INC8 :
    -- A3Pv2.6/test2_INC8/Wrap_INC8.vhdl
    -- version dim. déc.  8 08:07:34 CET 2019 : forked from INC8_tb.vhdl
    --
    -- Released under the GNU AGPLv3 license
    
    Library ieee;
        use ieee.std_logic_1164.all;
    Library work;
        use work.all;
    
    entity VG_Wrapper is
      generic (
        VectGenWidthIn : integer :=  9;
        VectGenWidthOut: integer := 10  
      );
      port(	
        VectIn : in  std_logic_vector(VectGenWidthIn -1 downto 0);
        VectOut: out std_logic_vector(VectGenWidthOut-1 downto 0);
        VectClk: in std_logic
      );
    end VG_Wrapper;
    
    architecture Wrap_INC8 of VG_Wrapper is
    begin
    
      dut: entity INC8 port map (
         --	here we	"wire" the unit to the Vector Generator ports:
         A => VectIn(7 downto 0),
         Y => VectOut(7 downto 0),
         V => VectOut(8)
      );
    
      -- the wires for "autoconfig":
      VectOut(9) <= '1';
      VectOut(VectGenWidthOut-1) <=	VectIn(8);
    end Wrap_INC8;
    

     That's it !

    As the name says, it's just a wrapper so nothing...

    Read more »

  • Winner scanner

    Yann Guidon / YGDES12/03/2019 at 04:14 10 comments

    The above picture shows the scanning pattern of the enhanced testbench for INC8. Not that it makes a big difference, since the scan is quite fast (1s on my i7) but for the ALU8, which lasts a few minutes at this moment, the 54% time saved will mean quite a lot...

    It works well for the INC8 and the ALU because they rely on carry propagation (of some sort). The algorithm uses a dual loop (outer forward, inner backwards) that "hits" the powers of 2 sooner than a simple linear scan : the index 128 will be reached after 64 iterations, for example. Many "failure modes" appear on powers of two, or the index before (like : 127 and 128) so reaching them faster is good. This results in the "inverse sawtooth" pattern of the above picture.

    This is boosted by another trick called "folding" that tests an index and its opposite. This creates the "horizontal mirror" of the picture. The resulting algorithm is a bit subtle but efficient and small: 

    -- 1743 cycles vs 3995 in linear mode !
    procedure reverse_folding is
      variable j : integer := 1;  -- the current power of 2
      variable k : integer := 0;  -- the inferior limit for the reverse scan
      variable l : integer := 0;  -- the sub-loop counter for reverse scan
    begin
      loop
        l := j;
        loop
          if (l < 128) then -- 128 appears 2x
            test_cycle(    l);
            test_cycle(255-l);
          end if;
          l := l-1;
          exit when l < k;
        end loop;
        k := j+1;
        j := j+j;
        exit when j > 128;
      end loop;
    end reverse_folding;

    We'll see soon enough if this cuts the run time of the ALU8 tests !


    Well, guess what ?

    For the thorough testing of the ALU8,

    • 13.248.331 simulation cycles in 383s to check all the faults with linear scanning
    • 1.121.723 cycles in 36s with reverse-folding !

    so it's roughly a 10x increase in processing efficiency !

    Upload: soon


    20191205 :

    down to 34s and only 931.316 cycles with this dumb simple tweak : I swapped the SRI and SND ports !

    I don't know how but I'll have to try some bit shufflings. However the search space is out of range : 16! = 20.922.789.888.000...

  • v2.4

    Yann Guidon / YGDES12/01/2019 at 15:28 0 comments

    I just uploaded A3Ptiles_v2.4_20191201.tgz and though it's a pretty modest update compared to the last archive (I added larger integers for the histograms and activity counters, plus a few features) the cumulative changes deserve a minor number increment ! So it's v2.4 already, and more features are brewing already : I am coding some files to import exclusion vectors from external files.

    Stay tuned.


    ... and in read_xcl.tgz I prototyped the code that reads the "exclusion files".

    I could read the whole file into memory but I don't want to use more memory than required, particularly during initialisation. So I scan the file along with the list of gates. It's not the most direct/simple method but it is light on real resources and scalable if the DUT grows.

    I'm about to include the mechanism inside the general system.

  • New feature : toggle counter

    Yann Guidon / YGDES11/27/2019 at 03:52 0 comments

    Here's a fun and easy one :

    The newest feature simply counts how many times each gate's output changes value.

    It might sound a bit silly for now but it was quick to implement and it will be very useful later when selecting the most appropriate implementation for the decoding logic. Some versions are "as fast as possible" and don't care about anything but giving the right result with the least logic depth as possible. Of course it generates a lot of logic activity and many spurious changes for nothing.

    Other versions use a slower but more careful logic with latches to minimise the number of control lines to change. This reduces noises and power consumption with CMOS and FPGAs.

    For now, there is no application but this will greatly help the design of the #YGREC8

    Of course I'll have to find a way to use 64 bits numbers with GHDL because I suppose that the current limit of 2³¹ cycles and toggles will be easy to exceed.

    Whatever...

    Stay tuned and watch for the latest upload in the files sections :-)

View all 31 project logs

  • 1
    Get and install GHDL

    get it there : https://github.com/ghdl/ghdl/releases

    also read http://ghdl.free.fr/

    Be sure to not use the MCODE version... The GCC and LLVM backends seem to work well.

  • 2
    Test

    under Linux :

    wget https://cdn.hackaday.io/files/1625946956421696/A3Ptiles_v2.6_20191229.tgz
    tar xzvf A3Ptiles_v2.6_20191229.tgz
    cd A3Pv2.6
    ./run_tests.sh

    This will run scripts for a minute or so, compiling everything and testing the results for consistency.

  • 3
    Use in your project

    Copy (or symlink) the directory proasic3v2 in your project.

    Use the tests/examples to setup your scripts and include the proper paths to the files.

View all 3 instructions

Enjoy this project?

Share

Discussions

Yann Guidon / YGDES wrote 2 days ago point

  Are you sure? yes | no

Tim wrote a day ago point

Looks exciting! Still very incomplete as of now, not even spice models of their nmos/pmos.

  Are you sure? yes | no

Yann Guidon / YGDES wrote a day ago point

yes but it's due soon !

  Are you sure? yes | no

Tim wrote 21 hours ago point

I hope this will also help improve all the OSS VLSI-Design toolchains. Magic is old.

The discussion about "130 nm is an ancient technology why not something more recent" are quite funny. Almost all of this "maker stuff" revolves around designs in 130 nm (afaik STM32F103) or much larger (AVR), not to speak of all the sensors, PMICs and LED-drivers in 350 nm or 180 nm.

I would be perfectly happy with reliable and low cost access to 350nm. Still not too sure about the conditions surrounding the skywater offer.

  Are you sure? yes | no

Yann Guidon / YGDES wrote 20 hours ago point

I agree about 130nm : it's a great node for "us". decent Vcore, good I/O voltages, fast, dense, and this fab supports Flash cells.
I'm discussing with the  @Tim Ansell about the conditions, it's very welcoming !

  Are you sure? yes | no

Tim Ansell wrote 20 hours ago point

The exact conditions are still being decided but the generally idea is "free as in beer" for "free as in freedom".

You can have a look over Google's documentation about licenses for open source projects at https://opensource.google/docs/thirdparty/licenses/#types for some idea of what is going to be suitable.

If you don't have a strong preference, just use Google's default license choice of Apache 2.0 -- From https://opensource.google/docs/thirdparty/licenses/#hardware

> The Apache license is our preferred license not only for source code  but also for hardware."

Stay well away from the licenses listed under https://opensource.google/docs/thirdparty/licenses/#banned

  Are you sure? yes | no

Tim wrote 13 hours ago point

Cool! Having no restriction on the PDK and derived products would already be a huge anomaly/change.

I was also wondering about the criteria to be able to use their manufacturing service. They mentioned it would be free for eligible designs (?). I guess there ha ve to be some limits.

Looking at current MPW costs at 130 nm, this is still off by one or two orders of magnitude for a noncommercial project without external funding. So they must do something to really bring cost down or put up a high barrier.

  Are you sure? yes | no

Yann Guidon / YGDES wrote 11 hours ago point

Hi @Tim Ansell  :-)
Thanks for the explanations and details !

I looked at the links about the licences and find that AGPL is considered "toxic" to your employer, though it makes perfect sense in the current situation, because we're not building a webservice for example. We all agree that the purpose is to totally open as many levels of the design pipeline as possible and AGPL unambiguously goes in that direction. We all want to have the full "source code" of the chips we have/own to ensure safety and security.


This gates library as well as most of my "FOSS" projects are under AGPL.

  Are you sure? yes | no

Tim wrote 9 hours ago point

Oh, I did not realise Tim Ansell was directly involved.

  Are you sure? yes | no

Tim Ansell wrote 3 hours ago point

@Yann Guidon / YGDES - I'm not a lawyer and so won't argue about reasons around the licenses.

The lawyers have said we can *not* accept AGPL files. Projects under AGPL won't be eligible for inclusion in the shuttle run.

  Are you sure? yes | no

Yann Guidon / YGDES wrote 08/08/2019 at 10:36 point

Damnit I found a wrong declaration for NAND2, with 3 inputs in PA3_components.vhdl :-/

  Are you sure? yes | no

Yann Guidon / YGDES wrote 12/02/2018 at 19:24 point

@SHAOS 

so far, I need the following gates for INC8 :

 INV  XOR2  AX1 AND3  AND2

I haven't looked yet at your library of CMOS cells at #Shared Silicon ...

  Are you sure? yes | no

SHAOS wrote 12/02/2018 at 23:54 point

What is "AX1"? My INV, AND2 and AND3 were "silicon proven"

I have XOR2 and NXOR2 too, but I didn't test them in silicon yet

  Are you sure? yes | no

Yann Guidon / YGDES wrote 12/03/2018 at 04:31 point

From the files :

architecture rtl of AX1 is
begin
  Y <= (A and B) xor C after gate_delay;
end rtl;

so it's a XOR2 with one input being the result of AND2. The absence of output buffer of AND2 makes the combined gate faster.

  Are you sure? yes | no

SHAOS wrote 12/03/2018 at 04:34 point

so I guess it could be built out of AND2 and XOR2 then :)

  Are you sure? yes | no

Yann Guidon / YGDES wrote 12/03/2018 at 05:11 point

It could but it would use more surface and be slower.

That's why it's interesting to design and analyse the logic with A3P : if you limit yourself to basic gates, you miss some useful optimisations.

This library of cells helps me design tighter circuits and when they are stable and optimised, I can then focus on the CMOS library to implement the required functions (if they are not available yet).

And your role becomes clear at this point ;-)

  Are you sure? yes | no

SHAOS wrote 12/03/2018 at 06:32 point

I don't think custom implementation of AND-XOR will be much smaller or faster of a simple combination of AND and XOR, may be just a little bit... 

  Are you sure? yes | no

Yann Guidon / YGDES wrote 12/03/2018 at 12:03 point

This little bit could make a significant difference :-)

  Are you sure? yes | no

Yann Guidon / YGDES wrote 12/03/2018 at 12:31 point

the merging saves the space of the intermediate buffer, as well as the interconnect between AND2 and XOR2.

  Are you sure? yes | no

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates