Application : Displace LFSR in BIST

You might know that I have a very broad range of interests, for example I know "a bit" about LFSRs and some of its applications, including using LFSRs in Built-In Self Test circuits of microchips for example. LFSRs look lile a natural choice for this because they use only DFF and XOR gates so "it must be fast and small". I even wrote an article in French about this. But if your circuit has registers and an ALU, like, for example, a CPU... you don't even need this.

One very useful and desired goal is to determine the characteristics of a chip after fabrication, so it can be "binned" in speed-classes, on top of ensuring that all the transistors have been correctly etched. Not only that, but it must be done FAST. Like, in a fraction of second at most so this increases the need of "efficiency". Now imagine you build a LFSR in the backyard of your ALU+REG complex, this adds wires and reduces the routing efficiency of the chip. It's best to use the circuits themselves to test themselves, right ?

Let's say you can characterise the work of a given set of units (Regs, ALU, some peripherals) in about 1/100s, at a given speed (let's say 10MHz for a very conservative design on an old process), this gives you about 100K instructions to exercise all the targeted transistors and wires.

Already we have seen that the w16 parameter has a period of 2.2G cycles and can also easily be extended to 4.4G by the flip of a bit so FULL COVERAGE is possible. Not only that but the order of scan is highly "scrambled", unlike counters and somehow like LFSRs, so the Pisano-carry algo can be used both for test vector generation, internally, and for syndrome compression.

And it's not even using gates that are outside of the reach of the ISA : the ISA tests itself. Now what do we need to to test the speed of the circuit ? A simple idea is simply to just bring the carry flag out on one pin (or through the debug interface, since it's just a big MUX, but beware of the speed/delay/buffering) and compare the generated bit-stream with one that is stored in a (Q)SPI Flash, for example. A Pi / FPGA will simply drive the clock (the ProASIC3 has a user-configurable PLL ^_^ ) and initialise the serial Flash chip, which is loaded with the expected "signature bitstream" of the Circuit Under Test.

Raise the clock slowly, until you see a divergence between the SPI stream and the CUT.

Given the size of available SPI chips today, 128M bits are cheap, a full run could last 1 second at 100MHz. It should be fast enough, right ? And it does not even exhaust the state space for w16 which is run entirely in software.

All you have to do is : upload some code to the internal program SRAM and bring the carry flag (probably with some buffers/DFFs) out on one pad.

The really nice thing, with using only ISA-visible resources, is that you can precompute the "golden bitstring" with a high-level simulator, hence it's faster than having to simulate all the gates including a LFSR. 2 decades ago I saw a paper that advocated to add one or two instructions to bring BIST coverage to 100% but it does not even seem required at this point.

Of course, #Libre Gates will be useful to quantify the test's coverage, and evaluate if a sequence of code is more efficient than another.

Stats about missed crossings

I need a big cluster.

Discussions

Become a Hackaday.io Member