BASIC Assembler Fix

A project log for Model 100 Assembler

Development of an assembler for the Tandy Model 100 portable computer.

clintrclintr 09/12/2014 at 03:420 Comments

Well, it turns out that BASIC strings can be at most 255 bytes long.  I was trying to keep all of the labels from an assembly file in one BASIC string, which meant I couldn't have very many labels.  Below is an updated BASIC assembler which fixes this problem by using arrays (turns out BASIC has arrays...)

Here is the updated BASIC assembler:

100 GOTO1000
200 PRINT"err near line",CL!
210 CLOSE1
220 END
1000 CLEAR2048:PS%=1:LM!=65535:HM!=0
1005 GOSUB2000
1007 PRINT"pass 1"
1020 GOSUB10000
1030 CLOSE1
1120 PRINT"bounds error":PRINT"LO =",LM!:PRINT"HI =",HM!
1130 END
1140 PS%=2:CL!=1:BF%=0
1145 PRINT"pass 2"
1160 GOSUB10000
1170 CLOSE1
1180 PRINT"success":PRINT"LO =",LM!:PRINT"HI =",HM!
1190 END
2000 A1$=""
2010 A2$=".adc.136.add.128.ana.160.cmp.184.ora.176.sbb.152.sub.144.xra.168"
2020 A3$=".dcr.5.inr.4"
2030 A4$=".dad.9.dcx.11.inx.3.ldax.10.pop.193.push.197.stax.2"
2040 A5$=""
2050 A6$=""
2060 B2$=".a.7.b.0.c.1.d.2.e.3.h.4.l.5.m.6"
2070 B3$=".a.56.b.0.c.8.d.16.e.24.h.32.l.40.m.48"
2080 B4$=".b.0.d.16.h.32.sp.48.psw.48"
2090 B5$=""
2100 DIMLB$(160)
2110 DIMLB!(160)
2120 MI%=0
2300 GOSUB2700
2320 IFNM!>255THENGOTO200
2330 BB!=NM!:GOTO2800
2400 GOSUB9000
2406 IFNL%<>0THENGOTO200
2410 CC$=LEFT$(TK$,1):GOSUB8100
2420 IFSC%=0THENGOTO2470
2430 NB$=TK$:GOSUB5000
2470 SS$=TK$:GOTO12000
2500 GOSUB2700
2570 GOSUB2600
2580 BB!=MD!:GOSUB2800
2590 BB!=DV!:GOTO2800
2600 IFNM!<=32767THENGOTO2620
2610 N!=NM!-32768:DV!=128:GOTO2630
2620 DV!=0:N!=NM!
2630 DV!=DV!+(N!\256):MD!=N!MOD256
2700 GOSUB2400
2740 TK$="0":NM!=0:RETURN
2800 IFPS%=1THENGOTO2850
2810 POKEPC!,BB!
2830 PC!=PC!+1:RETURN
2860 HM!=PC!
2880 LM!=PC!
2890 PC!=PC!+1:RETURN
3000 P%=INSTR(MP$,"."+SS$+".")
3010 IFP%<>0GOTO3040
3020 SC%=0:RETURN
3040 T$=MID$(MP$,P%+2+LEN(SS$))
3041 P%=INSTR(T$,"."):IF P%=0 GOTO 3050
3048 T$=MID$(T$,1,P%-1)
3050 NM!=VAL(T$):SC%=1:RETURN
3200 GOSUB9000
3210 CC$=LEFT$(TK$,1):GOSUB8000
3230 SS$=TK$:GOSUB3000
5000 N$=NB$:SC%=0:NM!=0
5010 K%=LEN(N$)-1
5030 A$=RIGHT$(N$,1)
5050 N$=LEFT$(N$,K%)
5060 FORI%=1TOK%
5080 A$=MID$(N$,I%,1)
5090 X%=ASC(A$)
5100 IFX%>=48ANDX%<=57GOTO5130
5110 IFX%>=97ANDX%<=102GOTO5140
5130 X%=X%-48:GOTO5150
5140 X%=X%-87
5150 NM!=NM!*16
5160 NM!=NM!+X%
5170 NEXT
5180 SC%=1:RETURN
8000 A%=ASC(CC$):SC%=1
8050 SC%=0:RETURN
8100 A%=ASC(CC$):SC%=1
8130 SC%=0:RETURN
8200 GOSUB8000
8220 GOTO8100
8300 SC%=1
8305 IFBF%<>0THENGOTO8345
8306 NL%=0
8320 CB$=INPUT$(1,1)
8322 C%=ASC(CB$)
8326 NL%=1:CL!=CL!+1:RETURN
8345 BF%=0:RETURN
8400 SC%=1:A%=ASC(CC$)
8430 SC%=0:RETURN
9000 TK$=""
9010 GOSUB8300
9015 IFNL%<>0THENGOTO9430
9030 CC$=CB$
9040 GOSUB8400
9060 IFSC%<>0THENGOTO9010
9070 IFCB$<>";"THENGOTO9200
9100 GOSUB8300
9105 IFNL%<>0THENGOTO9430
9150 GOTO9100
9200 IFCB$<>","THENGOTO9220
9220 CC$=CB$
9222 GOSUB8200
9231 BF%=1
9240 GOSUB 8300
9245 IFNL%<>0THENGOTO9440
9250 CC$=CB$
9260 GOSUB8200
9270 IFSC%=0THENGOTO9300
9280 TK$=TK$+CB$
9290 GOTO9240
9300 IFCB$<>":"THENGOTO9340
9320 TK$=TK$+CB$
9350 BF%=1
9380 GOSUB8400
9440 BF%=1:RETURN
9700 GOSUB9000
9710 IFNL%<>0ORTK$<>","THENGOTO200
10000 CL!=1:BF%=0
10010 GOTO10050
10030 GOSUB9000
10040 IFNL%=0THENGOTO200
10050 GOSUB9000
10055 IFNL%<>0THENGOTO10050
10060 TT$=TK$:SS$=TT$:MP$=A1$:GOSUB3000
10070 IFSC%=0THENGOTO10100
10080 BB!=NM!:GOSUB2800
10090 GOTO10030
10100 SS$=TT$:MP$=A2$:GOSUB3000
10110 IFSC%=0THENGOTO10150
10120 OC!=NM!:MP$=B2$:GOSUB3200
10130 BB!=OC!:GOSUB2800
10140 GOTO10030
10150 SS$=TT$:MP$=A3$:GOSUB3000
10160 IFSC%=0THENGOTO10200
10170 OC!=NM!:MP$=B3$:GOSUB3200
10180 GOTO10130
10200 SS$=TT$:MP$=A4$:GOSUB3000
10210 IFSC%=0THENGOTO10250
10220 OC!=NM!:MP$=B4$:GOSUB3200
10230 GOTO10130
10250 SS$=TT$:MP$=A5$:GOSUB3000
10260 IFSC%=0THENGOTO10300
10270 BB!=NM!:GOSUB2800
10280 GOSUB2300
10290 GOTO10030
10300 SS$=TT$:MP$=A6$:GOSUB3000
10310 IFSC%=0THENGOTO10350
10320 BB!=NM!:GOSUB2800
10330 GOSUB2500
10340 GOTO10030
10350 IFTT$<>"lxi"THENGOTO10400
10360 OC!=1:MP$=B4$:GOSUB3200
10370 BB!=OC!:GOSUB2800
10375 GOSUB9700
10380 GOSUB2500
10390 GOTO10030
10400 IFTT$<>"mov"THENGOTO10450
10410 OC!=64:MP$=B3$:GOSUB3200
10415 GOSUB9700
10420 MP$=B2$:GOSUB3200
10430 GOTO10130
10450 IFTT$<>"mvi"THENGOTO10500
10460 OC!=6:MP$=B3$:GOSUB3200
10470 BB!=OC!:GOSUB2800
10475 GOSUB9700
10480 GOSUB2300
10490 GOTO10030
10500 IFTT$<>"rst"THENGOTO10550
10510 OC!=199:GOSUB2700
10520 OC!=OC!+8*NM!
10530 GOTO10130
10550 IFTT$<>"org"THENGOTO10595
10560 GOSUB2700
10570 PC!=NM!
10590 GOTO10030
10595 A$=RIGHT$(TT$,1):IFA$=":"THENGOTO11110
10600 A$=LEFT$(TT$,1):SS$=MID$(TT$,2):MP$=B5$
10610 IFA$<>"r"THENGOTO10650
10620 OC!=192:TK$=SS$:GOSUB3230
10630 GOTO10130
10650 IFA$<>"c"THENGOTO10700
10660 OC!=196:TK$=SS$:GOSUB3230
10670 BB!=OC!:GOSUB2800
10680 GOSUB2500
10690 GOTO10030
10700 IFA$<>"j"THENGOTO10750
10710 OC!=194:TK$=SS$:GOSUB3230
10720 GOTO10670
10760 IFA$<>"d"THENGOTO200
10770 B$=MID$(TT$,2)
10800 IFB$<>"s"THENGOTO10890
10810 GOSUB9000
10820 IFNL%<>0THENGOTO200
10830 CC$=LEFT$(TK$,1):GOSUB8100
10840 IFSC%=0THENGOTO200
10850 NB$=TK$:GOSUB5000
10860 IFSC%=0THENGOTO200
10870 PC!=PC!+NM!
10880 GOTO10030
10890 IFB$<>"b"THENGOTO10950
10900 GOSUB2300
10910 GOSUB9000
10920 IFNL%<>0THENGOTO10050
10930 IFTK$<>","THENGOTO200
10940 GOTO10900
10950 IFB$<>"w"THENGOTO200
10960 GOSUB2500
10970 GOSUB9000
10980 IFNL%<>0THENGOTO10050
10990 IFTK$<>","THENGOTO200
11000 GOTO10960
11110 IFPS%=2THENGOTO10050
11120 B%=LEN(TT$)-1:A$=LEFT$(TT$,B%):GOSUB12500
11130 GOTO10050
12000 SC%=0
12020 FORI%=1TOMI%
12030 IFLB$(I%)<>SS$THENGOTO12060
12040 NM!=LB!(I%):SC%=1
12050 RETURN
12060 NEXT
12070 RETURN
12500 IFMI%>=160THENGOTO200
12510 MI%=MI%+1
12520 LB$(MI%)=A$
12530 LB!(MI%)=PC!
12540 RETURN

Here is the documentation: contains BASIC code for an assembler for the model 100. This is a

very limited assembler which is meant only to be used to create a better

assembler written in assembly.


1977,1978 Intel Corporation

This assembler only understands lowercase.

It understands only the following:

- usual 8085 opcodes

- numbers in hexadecimal only

- these must begin with a decimal digit and terminate with 'h', e.g.:


- the usual operands a b c d e h l m sp psw

- labels

- immediate operands may be labels or hex numbers only

- assembler directives:

- org

- end -- everything after "end" in the file is ignored

-- this is required

- db -- byte data as hex numbers only

- dw -- word data as hex numbers only

- ds -- note the # bytes may only be given as a number - not a label

- comments -- from ';' to the end of the line

The assembly code is converted to machine code directly in the model 100 RAM.

You could then use the BASIC keyword SAVEM to put the machine code in a file.

There are errors that this assembler won't catch; try not to make any.

*** Program Documentation ***

Before running the assembler:

- Your input file must start with an org directive.

- The assembler will only write to RAM from HIMEM to MAXRAM-1, so you need to

make sure your program will fit in that space. See documentation on the BASIC

CLEAR keyword for help with this.

** 1000 main

The assembler makes two passes of the file. The first pass makes sure that all

writes to RAM will be within the range from HIMEM to MAXRAM-1, and determines

the values of all the labels. The second pass writes the program to RAM.

The main program starts at line 1000, and calls the subroutine at 10000 once

for each pass.

** Variables:

FN$ input filename

PC! next location in RAM to write ('program counter')

PS% pass number

LM! lowest RAM address changed

HM! highest RAM address changed

CL! current line in input file

SC% used for boolean return values from subroutines

NL% boolean indicating whether a newline was read from input file

TK$, TT$ tokens read from input file

OC! instruction opcode

** 10000 single pass

The subroutine at line 10000 handles one pass. It is a big loop which reads

and assembles one instruction or directive at a time. The following cases

are treated within the loop:

    line number     case
    10060           all instructions listed in A1$
    10100           all instructions listed in A2$
    10150           all instructions listed in A3$
    10200           all instructions listed in A4$
    10250           all instructions listed in A5$
    10300           all instructions listed in A6$
    10350           lxi instruction
    10400           mov instruction
    10450           mvi instruction
    10500           rst instruction
    10550           org assembler directive
    10595           labels
    10610           all conditional return instructions
    10650           all conditional call instructions
    10700           all conditional jump instructions
    10750           end assembler directive
    10800           ds  assembler directive
    10890           db  assembler directive
    10950           dw  assembler directive

** 2000 init

The subroutine at line 2000 sets up some string variables before the first pass.

** 9000 get token

The subroutine at line 9000 gets the next token from the input file. It skips

whitespace and comments.

Recognized tokens are:

- a newline

- a comma

- a string of alphanumeric characters followed immediately by a colon

- a string of alphanumeric characters

If the end of the file is reached before any token, TK$="" upon return;

else if the token was a newline, NL%=1 upon return;

else TK$ contains the token found.

** 5000 hex string to number

The subroutine at line 5000 attempts to read a hexadecimal number in NB$. The

number must be in the range 0h to 0ffffh. If successful, upon return SC%=1

and NM! contains the number read. Otherwise, SC%=0.

** 8000 isalpha

The subroutine at line 8000 expects a single-character string in CC$. It checks

whether the character is a lowercase letter: if so it returns with SC%=1, else

it returns with SC%=0.

** 8100 isnum

The subroutine at line 8100 expects a single-character string in CC$. It checks

whether the character is a decimal digit: if so it returns with SC%=1, else

it returns with SC%=0.

** 8200 isalphanum

The subroutine at line 8200 expects a single-character string in CC$. It checks

whether the character is a lowercase letter or a decimal digit: if so it

returns with SC%=1, else it returns with SC%=0.

** 8300 get next char


The subroutine at line 8300 reads the next character from the input file, which

must not yet be at eof. The character is put in CB$ and, if the character is

newline, NL% is set to 1, else NL% is set to 0. Also, whenever newline is

read, the variable CL! is incremented.


The subroutine sets BF% back to 0, and returns whatever it returned the last

time it was called. Setting BF% to 1 before calling basically "unreads" the

last character read from the input file.

** 8400 iswhitespace

The subroutine at line 8400 expects a single-character string in CC$. It checks

whether the character is whitespace: if so it returns with SC%=1, if

not it returns with SC%=0. Here we consider any character with ASCII code

<= 20 to be whitespace.

** 2300 get imm1

The subroutine at line 2300 reads the next token from the input file, expecting

it to be a valid one-byte immediate operand (either a hex number or label, with

value in range 0h to 0ffh). The value of this byte is written to RAM at the

current PC! location, and the PC! is incremented.

** 2500 get imm2

The subroutine at line 2500 reads the next token from the input file, expecting

it to be a valid two-byte immediate operand (either a hex number or label, with

value in range 0h to 0ffffh). The value of this word is written to RAM at the

current PC! location, and the PC! is increased by 2.

** 2700 get number

The subroutine at line 2700 reads the next token from the input file, expecting

it to be a valid immediate operand (either a hex number or label, with value in

range 0h to 0ffffh). The token read is returned in TK$ and the value of the

operand is returned in NM! Note that if an undefined label is read, this is

allowed during the first pass but not the second pass.

** 2800 poke

The subroutine at line 2800 expects BB! to contain a number between 0h and 0ffh,

and PC! to contain the address in RAM where BB! should be written. During the

first pass, this subroutine just keeps track of the range of RAM to be written

(in LM! and HM!). During the second pass, this subroutine actually writes to


I needed a way to associate string "keys" with values. I did this

with 'MAP' strings in the form ".key1.val1.key2.val2... .keyn.valn".

Every key must start with a lowercase letter, and the values must be

decimal numbers.

Originally I was using such a map to store labels and their values, but

this didn't work out so well because BASIC strings can be at most 255

characters. So I made another type of map just for labels, which uses

arrays for storage.

** 3000 string map lookup

The subroutine at line 3000 looks up the value associated with key SS$ in MAP

MP$. If the key is found, the value is put in NM! and the subroutine returns

with SC%=1. Otherwise returns with SC%=0.

** 3200 add field into opcode

The subroutine at line 3200 gets the next token from the input file and uses

it as a key to look up in string MAP MP$. The value found is added into OC!

** 12000 label lookup

The subroutine at line 12000 looks for the label is SS$ in the label map.

If found, returns the associated value in NM! and with SC%=1;

else, returns with SC%=0 and NM! should not be used.

** 12500 define label

The subroutine at line 12500 adds the label/value pair in A$/PC! to the

label map.


A1$ contains opcodes for instructions which take no operands

A2$ contains opcodes which take one operand which is the name of a register,

which is encoded into the opcode by adding the value in B2$ associated

with the register into the opcode

A3$ contains opcodes which take one operand which is the name of a register,

which is encoded into the opcode by adding the value in B3$ associated

with the register into the opcode

A4$ contains opcodes which take one operand which is the name of a register pair,

which is encoded into the opcode by adding the value in B4$ associated

with the register pair into the opcode

A5$ contains opcodes which take a one-byte immediate operand

A6$ contains opcodes which take a two-byte immediate operand

B2$ see A2$

B3$ see A3$

B4$ see A4$

B5$ for rXX, jXX, cXX instructions, XX being the condition code, the condition

code is encoded into the opcode as bits 5, 4, 3 of the value associated

with the condition code in B5$