• Speeding up by 20%, profitability almost reached

    Maciej Witkowiak04/13/2021 at 15:36 2 comments

    Computing SHA256 is a challenge for 6502 CPU inside C64. The main problem is that this 8-bit CPU has only three 8-bit registers while 32-bit arithmetic is required. You need at least 4 instructions (one for every byte) for each 32-bit operation. In practice it will be closer to twelve: 4 times (load, change, store).

    Even though C64 has only 64KB of RAM, the primary way of speeding up programs is using more memory: unrolling loops and heavy use of lookup tables. For example for drawing pixels on the screen you could sacrifice some RAM to precalculate lookup tables for the memory address of every line of the screen.

    If you look on SHA256 pseudocode in Wikipedia there are only few basic operations needed: NOT, AND, XOR, addition and circular right bit shift rotation with varying shift length.

    Using lookup tables won't get us anywhere for anything except bit rotation. There is no benefit of using lookup tables for AND or XOR - it would take more time than the built in instructions.

    The right shift case is different because it depends on the shift length. The naïve and memory-conserving approach is to use a loop. This is what CC65 runtime does in the arithmetic shift to the right built-in function.

    But we need a circular shift - bits that fall out on the right end have to appear on the left side. So the actual operation is:

    uint32_t right_rot_gen(uint32_t value, uint8_t count) {
        return value >> count | value << (32 - count);
    }

     There are four 32-bit calculations here: two shifts, subtraction and OR.

    Can we do better? Yes.

    Read more »