Generate the Flash's contents with a recursive algorithm

As I just sent the PCB to fab, I have a couple of week to wait...

But the design is not over ! The Flash chip must be programmed with a lookup table contained in a file, which must be generated from the pin assignations, and this is not as straight-forward as it seems !

Now let's consider this : the Flash is pretty large, a few megabytes. We can work "in memory" and scan an array in RAM then dump it to a file. But this will take a long time, and cause a lot of cache misses. Yes, I'm an over-optimiser but if you consider that I might implement the algorithm with bash, for extra geek points, a better and leaner approach seems desirable.

I have chosen a "streaming" algorithm that doesn't hold the whole contents in RAM, but writes it to a file as soon as each byte or word is computed.

It's not that complicated, the algorithm might work like this:

Loop 2Mi times (as many as words)
For each word:
- compute the logical index from the physical index (the counter)
- compute the value corresponding to the logical index
- translate the value into an output code
- write the code to the output

That's very nice but in practice, it's obviously inefficient because a lot of values will be computed over and over but they never change...

Quite a few things can be precomputed, such as the digit->output code conversion. However, because we are totally reshuffling the address bits, addresses/indices must be recomputed at every cycle.

This computation is actually a remapping of the bits so it's not so arbitrary. If I flip the bit #n, then the bit #m will be flipped in the logical index. This opens an opportunity to save a lot of lookups...

A traditional bit-shuffling routine would loop over all the input bits (let's say 21 because that's how many address lines we have), then for each bit, lookup what is the position of the corresponding logical bit. Since there are 21×2²¹ lookups, that's a long computation overall.

I have found how to cut this cost in half with a little, neat recursive trick. It does not use a loop counter, but a bit index counter. Starting at index 21, for example, the procedure function calls itself twice with the decremented index. So the procedure is called with index 21, but each time with a different parameter. As long as the index is not 0, the procedure calls itself twice, leading to 2²¹ calls, as expected.

Here is a first JavaScript example of a recursive counter:

<html>
 <head>
  <script>
   function start() {
    var pre=document.getElementById("out");
    var msg="Starting:\n"

    // permutation of the input bits
    var BinLUT= [ 4, 2, 5, 3, 6, 0, 1 ];

    function recurse( index) {
      msg+=" "+index;
      bit=BinLUT[index];
      if (index>0) {
        index--;
        recurse(index);
        recurse(index);
      }
      else
        msg+=" *\n";
    }
    recurse(6);

    pre.innerHTML=msg+"end!"
   }
  </script>
 </head>
<body onload="start()">
 <pre id="out">
  empty
 </pre>
</body>
</html>

The output shows that one half of the upper bits are not evaluated, in average:

Starting:
 6 5 4 3 2 1 0 *
 0 *
 1 0 *
 0 *
 2 1 0 *
 0 *
 1 0 *
 0 *
 3 2 1 0 *
 0 *
 1 0 *
 0 *
 2 1 0 *
 0 *
 1 0 *
 0 *
 4 3 2 1 0 *
 0 *
 1 0 *
 0 *
 2 1 0 *
 0 *
 1 0 *
 0 *
 3 2 1 0 *
.....

Now the magic is that the first call forwards its initial parameter (the logical index) but, before the second call, the index is updated with the right bit set to 1. This leads to only 2²⁰ lookups.In the following example, the code performs a bit reversal:

<html>
 <head>
  <script>
   var msg="Starting:\n"

   // permutation of the input bits
   var BinLUT=[ 3, 2, 1, 0 ];

   function recurse( index, logic) {
     msg+=" "+logic;
     if (index>0) {
       var bit=BinLUT[index--];
       recurse(index, logic);
       recurse(index, logic|(1<<bit) );
     }
     else
       msg+="\n";
   }

   function start() {
    var pre=document.getElementById("out");

    recurse(3, 0);
    pre.innerHTML=msg+"end!"
   }
  </script>
 </head>
<body onload="start()">
 <pre id="out">
  empty
 </pre>
</body>
</html>

Starting:
 0 0 0 0
 4
 2 2
 6
 1 1 1
 5
 3 3
 7
end!

As the recursion nears the end, more bits are set. But it works with any permutation, not just bit reversal.

Discussions

K.C. Lee wrote 08/18/2016 at 03:16

It is only a few MB worth of data, even my old router has more than enough RAM to cache the file I/O. You can even declare an array and write the whole thing to disk when you are done.

The process of optimizing is a few time the order of magnitude higher than the one time overhead.

Are you sure? yes | no

Yann Guidon / YGDES wrote 08/18/2016 at 03:25

I know that it might sound overkill.

However, in this case, the bit permutations go in quite a few ways, as the inputs and outputs are shuffled... So even with a dumb big-array-based program, it's not easy to be sure that it works.

With the recursive approach, it's very easy to check that all permutations are done correcty, bit per bit, by changing only the initial parameter of the first recursive call.

Furthermore there might be other applications of this streaming permutation algorithm...

Are you sure? yes | no

K.C. Lee wrote 08/18/2016 at 04:14

I wrote a file indexing program at one point. It does the dumbest thing that they tell you not to do in computer science - linear search. I figure that all the overly smart optimization so that search can be faster is pointless. File system keeps changing and as a result the indexing cost more than search. I can do boolean expression partial match of 30MB worth in linear search (using strstr()) in a single thread in tens of milliseconds on a low end AMD PC that is easily 5 years out of date. (It compiles the query expression into a small virtual machine. :)

21×2²¹ (44040192) lookups isn't that much. Writing the code in dumbest C code by brute force would have so much faster more than carefully crafted scripts.

Are you sure? yes | no

Yann Guidon / YGDES wrote 08/18/2016 at 04:32

I know a bit about optimisation... But here I want to make something that is flexible and manageable with scripted languages, did I mention it could be written in bash, which is not a very optimised language ? :-D

So I'm looking for convenience, not ultrafast computations (I could but I wouldn't, just for a file that is generated once in a while). Furthermore, even though raw performance is not the requirement, it must be very easy to update the script and rerun it during development and adaptation (if I mistakenly swapped pins).

Oh and I save on memory allocation and block writes :-P I will just putchar()....

Are you sure? yes | no

Another module

Bit shuffling: what goes where ?

Discussions

Become a Hackaday.io Member