1. System requirements

    The CPU of the computer to be run is the 6502, which was widely used about 45 years ago. The total memory requirement for the system and user area is 16kB. The user interface uses a RS232C serial terminal. Figure 1 shows the execution of this interpreter.

                                                Fig.1  Floating point interpreter CI-2 running on my 6502 computer PERSEUS-8

2. Language specification

    The language specifications are listed in the following. This specification was not defined exactly at the beginning, but was the result of trial and error.

    The numeric values were set to 32-bit single-precision floating-point type. In order to simplify the conversion of ASCII code to numbers, BCD notation was used instead of binary. Therefore, the number of significant digits and the numerical range are inferior to those of general single-precision arithmetic systems. In order to avoid parsing, variables and instructions were defined as single letters of the alphabet. The formula uses RPN (Reverse Polish Notation). This is also to avoid parsing.

    There are three one-dimensional array variables:)X,)Y,)Z. The arguments range from 0 to 255. This alone occupies 3kB of memory. There is no statement equivalent to a FOR loop, but I decided to use a conditional branch and a jump to the specified line number instead. Therefore, it is not suitable for large-scale programming.

3. Software configuration 

    Figure 2 shows the block diagram of this interpreter system. The system will wait for one line of input after startup. After inputting a single line from the serial interface, it analyzes it one character at a time from the left of the line buffer. Here, if a line number is detected at the beginning of a line, the process moves to the line editor. If no line number is detected, parse one character at a time as direct execution mode. If it is a number, it is converted to BCD and stored in the floating point register. If it is an arithmetic operator or a command, it moves to the respective processing routine. If a ‘CR’ code is detected, perform the assignment and register value display, and return to waiting for one-line input.

    Program execution is performed by setting the execution flag upon detection of a ‘! ‘ code and the execution flag is set. Once the execution flag is raised, a single line of analysis is executed from the beginning of the program area. In the case of a conditional branch statement, the line with the line number that matches the line number of the jump destination is searched from the top of the program and executed from there.

4. Floating point expression

    The floating point expression in this system is shown in Fig. 3. A floating-point variable consists of a 3-byte 24-bit mantissa part, a 7-bit exponent part, and a 1-bit mantissa sign. I use BCD for notation because it is easy to convert ASCII codes to numbers, and because the register values can be intuitively read on the binary LED display of the front panel of PERSEUS-8 as shown in value examples of Fig.3. In this 6502 CPU, setting a flag, so the implementation program is simple can perform the BCD operation. 

    The exponent part is expressed as a complement instead of a general offset, because I thought it would be easier to understand the values when debugging intuitively. However, the process of adding and subtracting, I felt that it is troublesome to compare the exponents of two variables making them equal by using complement. Furthermore, in the 6502 CPU, a complement is the complement of 100 in the case of BCD, but in this case, it is 7 bits, so it had to convert it to the complement of 80.

5. Four arithmetic operation

    Figure 4 shows the memory map and register structure of this system. The machine language instructions of this 6502 CPU basically allow only 8-bit addition and subtraction for arithmetic operations. Therefore, the registers for floating-point arithmetic must be defined in memory, and the processing of the mantissa and exponent parts of the four arithmetic operations must all be implemented by an algorithm.

    The RPN has a four-stage arithmetic stack, from R0 to R3, and operations of two terms are performed, for example, R0 = R1 + R0 for addition. The three 48-bit registers are used to calculate the mantissa for multiplication and division. In the case of the 24-bit mantissa section, the mantissa registers for multiplication need to be twice as large, 48 bits. The nine registers from R4 to R12 are used to calculate the elementary functions.

    In addition and subtraction, it needs to make the exponent part of the two terms the same. It compares the exponential parts of R1 and R0, and change the smaller one to be the same as the larger one. The mantissa of a variable of the smaller exponent is shifted to the right to compensate. The mantissa part then converts the 24-bit absolute value and sign to the 32-bit complement, performs addition and subtraction, and then converts it back to absolute value and sign again. The reason for extending the bit is to ensure that the carry component is not missing.

    In multiplication, the exponent part 7 bits of the two terms are added. The mantissa part is stored in a 48-bit register, as shown in Fig. 5. The multiplier is shifted right and extracted 4 bits at a time, and the multiplicand is accumulated in the product register by the number of times. The multiplicand register is shifted to the left for each digit.

    In division, the 7 bits of the exponent part of the two terms are subtracted. The mantissa part is placed in a register as shown in Fig. 5. In this case, I used a restoring division. The divisor is subtracted from the dividend, and if the result of the subtraction is positive, the subtraction is repeated until the result becomes negative. When the result of the subtraction goes negative, the divisor is added once and the result is returned to positive. This number of subtractions is placed at the bottom of the quotient register. For the next digit, the divisor is shifted four bits to the right and the quotient register is shifted four bits to the left in the same manner.

6. Elementary functions

    The square root function is calculated using Newton's method. To ensure fast convergence even with large numbers, an initial value of 1/2 of the exponential part of the argument was used.

    The sine function uses a Maclaurin expansion up to the ninth order. The coefficients refer to the table. The arguments are transformed to be between zero and pi/2, and the results are transformed accordingly. The cosine function is calculated as the argument of the sine function, which is the argument plus Pi/2. The tangent function is obtained by sine/cosine.

    The arcsine function is obtained by the bisection method using the sine function. The arccosine is obtained by subtracting the arcsine function value from Pi/2. The arctangent function is obtained by the bisection method using the tangent function.

    The exponential function is obtained by the Maclaurin expansion up to the ninth order. However, a good approximation is only possible when the argument is near zero. Therefore, as shown in Fig. 6, the function is decomposed into products of exponential values weighted by powers of two, and the fractional part. Only the fractional part is used as the argument for the Maclaurin expansion.

    The natural logarithm function also uses the Maclaurin expansion up to the ninth order, but the function is decomposed into a sum of several terms so that the argument of the Maclaurin expansion is from 1.0 to 2.0. The other terms refer to the table. The hyperbolic function is obtained using an exponential function.

7. Line editor

    When a ':' code is entered following the first four digits of a single line input, it is automatically recognized as a line in the program and inserted into the program. If the same line number already exists in the program, the new line will replace it. If there is nothing after the line number and ':' code when a single line input, a line having that number will be deleted from the program.

8. Development environment

    This interpreter was developed by hand assembling. The machine codes were entered using the toggle switches on the front panel of the PERSEUS-8, and debugging was done using the single-step function for each routine up to about 200 bytes. I also inserted jump instructions in the program to trap the execution and examine the memory values. At runtime, the system area is switched to a write-protected state. This ensures that even if a bug causes the program to run out of control, the program will not be damaged.

    Figure 7 shows an example of debugging: the 16-bit address toggle switches on the front panel of the PERSEUS-8 is 0010h, specifying the least significant byte of the mantissa part of the 32-bit floating-point register R0, and the 8-bit data LED displays 21 in BCD. On the other hand, the mantissa part of the result of the square root of 2 displayed on the serial terminal is 1.41421, and the two least significant digits match at 21. In this way, I can check the value of the internal register.

                                                                 Fig. 7  Debugging while checking the memory value.

9. Current results

    Currently, the code size of this interpreter shown in attached file (ASSEMBLY_CODE_CI-2_V1_1_0.pdf) is about 6.3kB. Of that, the sum of floating-point arithmetic and numerical input/output is 1.7kB, the analysis execution part is 1.7kB, the elementary function part is 2.4kB, and the editor part is 0.5kB. Bugs are probably still quite present. Since there is no error-determining function in the parser, the result of incorrect syntax cannot be guaranteed.

    As for the accuracy of the floating point four arithmetic operations the six digits worked correctly. However there was no rounding by the guard bit, and it is truncated in this system. The accuracy example of the function was almost +/-4 at the sixth digit for the sine function. At first, I tried to calculate the function using the CORDIC (Coordinate Rotation Digital Computer) algorithm, but due to the reduction of significant digits during subtraction, I could not get good results, so I decided to mainly use the Maclaurin expansion for function calculation.

    The arithmetic speed on the user program of this system was 2.0 ms for addition and 4.4 ms for multiplication, which is 1/100,000th of the execution speed of a standard personal computer made in 2020.

    As an example of an application program, the FFT (Fast Fourier Transform)[2] program and the execution result are shown in the attached file (FFT_02_V_1_1.pdf). The bottom part of the attached file shows the 32-point, four-period sine wave and DC component generated as input, and POWER SPECTRUM shows the result of Fourier transforming it, showing that the frequency spectrum of the input waveform is correctly calculated because there are large peaks in the resultant values for periods 0 and 4.

10. Demo video

    In this video, I will demonstrate arithmetic operation and natural logarithm functions in CI-2's direct mode. In the program execution mode, I run a sample software program with exponential functions. It also shows how to add a line to control the execution of the program using a pointer variable. There is no audio explanation, so please turn on the subtitles to watch.

11. Making ROMs of the interpreter

      The interpreter is running on the computer's battery-backed RAMs, but it can also run on ROMs. The programming to the ROMs was done using the programmer presented in my other project Fully manual PROM programmer, which uses four D2716 type 2 kB EPROMs to store the entire interpreter in an 8 kB capacity. The entire programming process took 20 hours manually. Figure 8 shows how the programmed ROMs are mounted on the computer PERSEUS-8.

                                                          Fig. 8  ROMs of the interpreter

    Whether the interpreter is executed in ROMs or RAMs is switched by a toggle switch on the board. When running in ROMs, the single step operation mode of PERSEUS-8 with a system clock of 2 MHz will not work. This is because the access time of this PROM is slow at 450 ns compared to 100 ns for SRAM.

References

[1] MCS6500 MICROPROCESSORS PRELIMINARY DATA SHEET, MOS TECHNOLOGY, INC. MAY, 1976.

[2] Steven W.Smith, The Scientist and Engineer’s Guide to Digital Signal Processing, California Technical Pub, 1997.

(Rev. Nov. 04, 2021)