close-circle
Close

Generate the Flash's contents with a recursive algorithm

A project log for DYPLED

a thin LED display in SIP module format, much better than TIL311s to visualise your computer buses

Yann Guidon / YGDES 08/18/2016 at 02:324 Comments

As I just sent the PCB to fab, I have a couple of week to wait...

But the design is not over ! The Flash chip must be programmed with a lookup table contained in a file, which must be generated from the pin assignations, and this is not as straight-forward as it seems !

Now let's consider this : the Flash is pretty large, a few megabytes. We can work "in memory" and scan an array in RAM then dump it to a file. But this will take a long time, and cause a lot of cache misses. Yes, I'm an over-optimiser but if you consider that I might implement the algorithm with bash, for extra geek points, a better and leaner approach seems desirable.

I have chosen a "streaming" algorithm that doesn't hold the whole contents in RAM, but writes it to a file as soon as each byte or word is computed.

It's not that complicated, the algorithm might work like this:

That's very nice but in practice, it's obviously inefficient because a lot of values will be computed over and over but they never change...

Quite a few things can be precomputed, such as the digit->output code conversion. However, because we are totally reshuffling the address bits, addresses/indices must be recomputed at every cycle.

This computation is actually a remapping of the bits so it's not so arbitrary. If I flip the bit #n, then the bit #m will be flipped in the logical index. This opens an opportunity to save a lot of lookups...

A traditional bit-shuffling routine would loop over all the input bits (let's say 21 because that's how many address lines we have), then for each bit, lookup what is the position of the corresponding logical bit. Since there are 21×2²¹ lookups, that's a long computation overall.

I have found how to cut this cost in half with a little, neat recursive trick. It does not use a loop counter, but a bit index counter. Starting at index 21, for example, the procedure function calls itself twice with the decremented index. So the procedure is called with index 21, but each time with a different parameter. As long as the index is not 0, the procedure calls itself twice, leading to 2²¹ calls, as expected.

Here is a first JavaScript example of a recursive counter:

<html>
 <head>
  <script>
   function start() {
    var pre=document.getElementById("out");
    var msg="Starting:\n"

    // permutation of the input bits
    var BinLUT= [ 4, 2, 5, 3, 6, 0, 1 ];

    function recurse( index) {
      msg+=" "+index;
      bit=BinLUT[index];
      if (index>0) {
        index--;
        recurse(index);
        recurse(index);
      }
      else
        msg+=" *\n";
    }
    recurse(6);

    pre.innerHTML=msg+"end!"
   }
  </script>
 </head>
<body onload="start()">
 <pre id="out">
  empty
 </pre>
</body>
</html>
The output shows that one half of the upper bits are not evaluated, in average:
Starting:
 6 5 4 3 2 1 0 *
 0 *
 1 0 *
 0 *
 2 1 0 *
 0 *
 1 0 *
 0 *
 3 2 1 0 *
 0 *
 1 0 *
 0 *
 2 1 0 *
 0 *
 1 0 *
 0 *
 4 3 2 1 0 *
 0 *
 1 0 *
 0 *
 2 1 0 *
 0 *
 1 0 *
 0 *
 3 2 1 0 *
.....

Now the magic is that the first call forwards its initial parameter (the logical index) but, before the second call, the index is updated with the right bit set to 1. This leads to only 2²⁰ lookups.In the following example, the code performs a bit reversal:
<html>
 <head>
  <script>
   var msg="Starting:\n"

   // permutation of the input bits
   var BinLUT=[ 3, 2, 1, 0 ];

   function recurse( index, logic) {
     msg+=" "+logic;
     if (index>0) {
       var bit=BinLUT[index--];
       recurse(index, logic);
       recurse(index, logic|(1<<bit) );
     }
     else
       msg+="\n";
   }

   function start() {
    var pre=document.getElementById("out");

    recurse(3, 0);
    pre.innerHTML=msg+"end!"
   }
  </script>
 </head>
<body onload="start()">
 <pre id="out">
  empty
 </pre>
</body>
</html>
Starting:
 0 0 0 0
 4
 2 2
 6
 1 1 1
 5
 3 3
 7
end!
As the recursion nears the end, more bits are set. But it works with any permutation, not just bit reversal.

Discussions

K.C. Lee wrote 08/18/2016 at 03:16 point

It is only a few MB worth of data, even my old router has more than enough RAM to cache the file I/O. You can even declare an array and write the whole thing to disk when you are done.

The  process of optimizing is a few time the order of magnitude higher than the one time overhead.

  Are you sure? yes | no

Yann Guidon / YGDES wrote 08/18/2016 at 03:25 point

I know that it might sound overkill.

However, in this case, the bit permutations go in quite a few ways, as the inputs and outputs are shuffled... So even with a dumb big-array-based program, it's not easy to be sure that it works.

With the recursive approach, it's very easy to check that all permutations are done correcty, bit per bit, by changing only the initial parameter of the first recursive call.

Furthermore there might be other applications of this streaming permutation algorithm...

  Are you sure? yes | no

K.C. Lee wrote 08/18/2016 at 04:14 point

I wrote a file indexing program at one point.  It does the dumbest thing that they tell you not to do in computer science - linear search.  I figure that all the overly smart optimization so that search can be faster is pointless. File system keeps changing and as a result the indexing cost more than search.  I can do boolean expression partial match of 30MB worth in linear search (using strstr()) in a single thread in tens of milliseconds on a low end AMD PC that is easily 5 years out of date.  (It compiles the query expression into a small virtual machine. :)

21×2²¹ (44040192) lookups isn't that much.  Writing the code in dumbest C code by brute force would have so much faster more than carefully crafted scripts.

  Are you sure? yes | no

Yann Guidon / YGDES wrote 08/18/2016 at 04:32 point

I know a bit about optimisation... But here I want to make something that is flexible and manageable with scripted languages, did I mention it could be written in bash, which is not a very optimised language ? :-D

So I'm looking for convenience, not ultrafast computations (I could but I wouldn't, just for a file that is generated once in a while). Furthermore, even though raw performance is not the requirement, it must be very easy to update the script and rerun it during development and adaptation (if I mistakenly swapped pins).

Oh and I save on memory allocation and block writes :-P I will just putchar()....

  Are you sure? yes | no