Close

Operations: Q (integer square root)

A project log for Bit-serial CPU based on crossbar switch

256 switches and few shift registers to implement a working 16 or 32-bit integer arithmetic calculator (+, -, *, /, isqrt, BCD / bin conv..)

zpekiczpekic 05/08/2022 at 02:360 Comments

With integer division, comparison, subtraction already implemented, it was possible to squeeze in integer square root as the algorithm can reuse these operations. Note:

The algorithm uses Heron's method described here. For implementation reference, see  to microcode lines 293 to 322 approx.

  1. First, argument in TOS ("S") is duplicated on the stack, dup() subroutine is used for this
  2. TOS is then shifted down (halved)
  3. If zero flag of TOS is set, we are trying isqrt(0) so bail with result 0 (but after 1 stack pop to not leave already duplicated argument in NOS)
  4. Note that 1 shifted down is also 0, which would give false result isqrt(1) = 0 - this is caught in div() subroutine by examining the delay status bit
  5. Now the stack is prepared to contain: X0, S, *, *, *, *, X0 - where X0 = S/2
  6. heron_step() is invoked to calculate the first iterative guess ("X1") which appears in TOS upon return
  7. root_loop: new "X1" and old "X0" iteration results are compared. This boils down to comparison between TOS and R7, which is a simple subtract with result not stored anywhere, just carry bit examined 
  8. if carry flag is set (X1 >= X0), iterations are over and result in R7 is copied to TOS, bail.
  9. root_cont: "new" X1 value becomes "old" X0 (which means R7 <= TOS)
  10. new X1 guess is obtained using heron_step() and proceed to step 7. 

heron_step() subroutine implements the following calculation:

x1 = ( x0 + s / x0 ) / 2;

which is:

TOS <= (NOS + DIV(NOS, R7)) >> 1

// used as step in integer square root
heron_step:    nuke, matrix_nop1();
        connect row_r7 to col_tos, div2(max, set);     // X0, S, S, ?, ?, ?, ?, X0
        divmod(same);                                // mod, div, S, ?, ?, ?, ?, X0
        nuke, matrix_nop1();
        disconnect row_nos from col_nos;
        connect row_r2 to col_nos;                    // NOS <= R2 (S)
        connect row_nos to col_adc1;
        connect row_r7 to col_adc2;                    // TOS <= R7 + NOS 
        connect row_sum to col_tos, c_flag <= zero, div2(max, set);
        // NOTE terrible fall-through!! 
            
// shift TOS >> 1            
shift_down:    prep_shift();
shift_dloop:    STATUS = busy_using_mt, opr = m2_m2_np, z_flags <= update, d_flag <= column, if bitcnt_is_zero then return;
        bitcnt <= dec, goto shift_dloop;    

Discussions