Close

Operations: *, /

A project log for Bit-serial CPU based on crossbar switch

256 switches and few shift registers to implement a working 16 or 32-bit integer arithmetic calculator (+, -, *, /, isqrt, BCD / bin conv..)

zpekiczpekic 05/01/2022 at 02:090 Comments

These two operations are similar in following ways:

* (TOS:NOS <= TOS * NOS)

A classic shift-add algorithm is used, as described here for example. 

Microcode lines 360-390 implement the steps:

  1. prep_regs() subroutine moves TOS to R7 and clears TOS. TOS which is now 0, and NOS can be imagined as one long 2*n (e.g. 32 or 64 bit) register where the shift and add occurs. The "parameter" max passed in looks by syntax as a subroutine input parameter, but it is simply a shorthand for register assignment operation in the same microcode cycle which calls the subroutine. This way when first instruction of subroutine executes, the loopcnt register is set to n-1 (max value). Definition of subroutine can be seen around line 204. 
  2. Single rotation (n operations in subroutine above) also updated the zero bit for register NOS, so if this is set we can bail out as we have x * 0 = 0
  3. NOS is shifted down to get LSB, which lands in delay status flag. To preserve state of registers a shift up is executed right after (np_d2_d2 is followed by np_m2_m2)
  4. If LSB was 1, execute add step (m_add_r7)
  5. The addition can result in carry, so set delay status bit (which will be shifted into MSB of TOS), and proceed with shift of the 32/64 aggregate TOS/NOS register (m_shift1)
  6. If addition produced no carry, set delay bit to 0 so that will be shifted into MSB of TOS (m_shift0)
  7. m_shift: note how TOS and NOS are connected together along with delay status bit. A single rotate down step is executed. Because all other registers (R2...R7) get this step too, after n cycles they will be rotated back to original value, only NOS and TOS accumulating the result
  8. switch matrix connections are reset for next iteration, loopcnt is examined, if zero then iterations are done
  9. loop counter is decremented and proceed with step 3.

/ (TOS <= NOS div TOS, NOS <= NOS mod TOS)

The unsigned (restoring) binary division algorithm and conceptual design of the hardware for it are described here. The only difference is that all operations in the flow chart are executed serially so need n cycles. The beauty of this setup is that it simultaneously generates div and mod values. For clarity, even register names A, Q, M are used in the microcode (note that using .alias directive makes it easy to give alternative name to any symbol)

Microcode lines 490-530 implement the steps (note the subroutine is factored out so that can be used also as part of square root algorithm):

  1. prep_regs() subroutine described above is invoked to clear TOS after saving it to R7.
  2. if NOS was detected as zero, is is divide by zero case, so bail out to error routine (there is a special case when called from square root flagged by delay = 1)
  3. divloop: Q (NOS) and A (TOS) are connected into a 2*n long register and shifted up (m2_m2_np)
  4. A <= A - M step is executed. Most of this operation is factored out into a_pm_m() subroutine, but it is called with carry set and with M going through inverter. That way standard A <= A + !M + 1 is achived.
  5. If step 4 resulted in carry set then it means A >= M, so "M goes into A", and 1 can be shifted into Q
  6. To not keep other registers same, first all are rotated down (d2_d2_np), then delay bit is connected to Q and all rotated up (lines 515-522). 
  7. div_next: if loopcnt reached zero, means algorithm is done, so return is executed, otherwise decrement loopcnt and go to step 3
  8. restore_a: If step 4 resulted in no carry, means "M doesn't go into A" so we need to restore the value of A.  Again, a_pm_m() is used for this, but with carry zero, so A <= A + M + 0
  9. proceed with step 7

Discussions