Close

dflat - a computer language

A project log for 65c02 Homebrew Computer on breadboard

Custom designed breadboard computer (BBC!) with bespoke programming language, full graphics, sound and SD card storage.

6502nerd6502Nerd 12/16/2015 at 19:123 Comments

Update September 2016 :

I've added a token table which describes all the tokens that dflat understands - basically this is the list of statements, commands, functions and operators.

Core Features

The core features of dflat are:

Keywords

I thought perhaps I should just dump the keywords, functions and operators that dflat supports. Anyone with a memory of BBC, Atari or Oric BASIC will know most of these commands.

Current list of supported symbols:

Some of you may recognise the sound commands - they are based on the approach used by Oric BASIC. This is both for nostalgia reasons and because both my computer and the Oric use the same sound chip.

I have a few unresolved bugs / limitations which I need to figure out and do something about:

I now have the following features:

Benchmarks

dflat has enough capability to try out benchmarks that test the performance of the language. I have used PCW (a UK monthly computer magazine) BASIC benchmarks from the early-mid 80s as this represents the closest era that my homebrew is equivalent to. The most complex benchmark tests the trig functions, but as my language is integer only (for now), the next most complex benchmark (number 7) is the one I have used.

I have tried to reproduce the original benchmark using dflat as faithfully as reasonable without trying to optimise further. Here is original and dflat code:

PCW Benchmark from early 1980s:
20 LET k=0
25 DIM m(5)
30 LET k=k+1
40 LET a=k/2*3+4-5
45 GOSUB 700
46 FOR l=1 TO 5
47 LET m(l)=a
48 NEXT l
50 IF k<1000 THEN GOTO 30
700 RETURN

dflat equivalent from 2016!
100 def_bm7()
110 println "Start"
200 %k=0
210 dim %m[5]
220 repeat
230  %k=%k+1
240  %a=((%k/2)*3)+4-5
245  _sub()
260  for %l=1,5,1
270   %m[%l]=%a
290  next
300 until %k==1000
305 println "Stop"
310 enddef
320 def_sub()
330 enddef

I got the benchmarks from the following site (* don't click on the nonsense adverts *):

http://www.geocities.ws/peterochocki/computers/pcwbm.html

Near the bottom of this page is a very interesting table of benchmarks for various old to new(ish) computers. I mentioned before that my expectation was that my restricted integer-only language should be quite a bit faster than most 1980s 6502 and Z80 based systems, save for the BBC Micro which is known to have a very speedy implementation.

I ran the benchmark a few times, and the result is....drum roll.... 7.5 seconds. I am very happy with this, it easily beats all the 6502 and Z80 machines of that era - including the BBC. Typical timings are ZX Spectrum 77.6s, Commodore 64 47.5, Atari 800XL 60.1s, ZX80 (integer only) 39.2s, BBC B 21.9s.

However, I need to acknowledge that aside from the simplicity (and therefore speed) of only working with integers, my system is also clocked at 2.68Mhz, which is significantly faster than all the other machines including the BBC. However, even adjusting for clock speed, dflat is significantly faster, and that is down to the major factor of unsigned integer arithmetic, plus extensive tokenisation and optimisation before and during runtime. For example:

I will now go on to some essential functions (save, load) followed by some useful graphics and sound functions.

The main challenge now is ROM space. Currently I have less than 1.5K remaining. This is starting to feel tight as I have a few more features to implement (e.g. more graphics and sound), but I haven't really space optimised the code so hopefully it will be enough. Plus I know that I can free up almost 1K of font data if I am desperate.

Introduction

This log is going to document my investigations and implementation of an interpreted language capable of running from ROM. Once doing this, my system will have the built in facility to write, debug and run programs without any external dependency - at the moment I have to use an assembler on the PC.

Initial Research

So, where to start in building a computer language? This has been done many times before of course, but the purpose of doing it for myself is the personal learning and hobby interest I have in knowing how stuff works. And of course the slightly egotistical pleasure of having made a language for my own requirements.

I have done a lot of background reading, relearning about lexers, parsers etc. I say relearning because this is taking back to my computer science undergraduate days - it's reassuring to find the concepts haven't changed dramatically.

The academic side of understanding how to design and build a programming language is good, but I have real world constraints - an 8 bit CPU with limited RAM and ROM!

I looked at Forth - a very compact language, fairly well suited for 6502. In fact I remember the 48k of the Oric-1 came with Forth on tape (alas, I had the 16k version). So if it can work on a 1Mhz 6502, I am sure it would be fine on my homebrew. But the reason for the compactness of Forth is the stack orientation, terseness and reverse polish notation which basically means that expression evaluation and parsing is made rather more straightforward.

So Forth is out, it doesn't feel like 'progress' enough for me! My first language was BASIC (I am of that age), so might be a natural choice. But I wanted to make an advance on 80s BASIC by having no line numbers - and therefore all program control is through structured constructs like procedures, functions and blocks.

The implication of no line numbers is that I would need a full-screen editor capability and this was going to take yet more ROM space (which elsewhere I have noted I am using up quite rapidly).

So I have decided that my language will use line numbers - but only as a way of inserting, editing and removing lines. The first execution statement will not be the lowest line number, but a 'main' type procedure. All flow of control will be through structured programming constructs - there will be no GOTO or GOSUB!

So I now need to develop the syntax. I'll post examples of the syntax as I progress - still thinking through the grammar.

Syntax

I spent a while trying to work out a syntax which was human readable but also easy for the machine to decode. Keeping it easy for a 6502 to decode will help me keep the code small and running fast.

Variables:

All variables must be pre-fixed with a type - one of $, %, ^ or ! e.g.

I've decided not to support floating point right now - need to get something working first.

The program editor uses line numbers, but does not have control and flow keywords which refer to line numbers directly - the line numbers are purely to organise code in memory. Some examples below:

100 ; This is a comment
110 ; Next line is an example of a conditional.
115 ; ':' separates statements on same line
120 if %a > %b then:%a = %b:else:%b = %a:endif
130 ; _ is reserved to defined and invoke user functions
140 _myfunction(1, 2)
150 ; The next line defines a user function
160 def_myfunction(%a, %b)
162 ; make variables local scope explicitly else they are
163 ; globally accessible
165 local %a
170 return %a + %b
180 enddef
190 ; for, while and repeat all supported
200 repeat
210 %a = %a + 1
220 until %a > 10
230 while %a <> 0
240 %a = %a - 1
250 wend
275 ; for loop syntax simplified to specifying start, end, step
270 for %a = 1, 10, 1
280 println %a
290 next %a
295 ; strings have to be dimensioned before use, as do number arrays
300 dim $a[10], %a[10,10]
310 ; Note in the above that $a and %a are different variables

Will stop there for the moment - let me know what you think of the syntax. It has been inspired by various influences including:

Whether I get all the constructs working the way I want will depend on space.

Numeric expression tokenisation and runtime

This rather wordy subtitle is capturing some notes on progress. One of the more complex areas I have come across is expression parsing and execution. Let me give an abstract example of an expression using a simplified BNF type notation, restricted to numbers only:

<expr> = <term>[<op><term>]

<term> = digit[digit]

digit = 0..9

<op> = + | -

The above notation allows simple expressions like

Great, I can basically parse a syntax tree like this linearly. But what if a term needs to be more complex i.e.

<term> = digit[digit] | (<expr>)

So this allows me to make expressions like

This is all still quite graspable I hope you will agree. Basically <expr> is recursive as it refers to itself (as part of the definition of <term>, and nicely implementable using a friendly language like C or Java, but how to do it nicely using a lowly 6502 with 3 registers!?!?

The main thing to worry about at assembly level is saving state when recursing, or trying to eliminate state altogether. Actually that's not possible - we have to know where we recursed from to be able to return to that point. But that part is looked after by the 6502 itself - making a subroutine call (JSR) automatically pushes the program counter on to the processor stack.

The interpreter takes a line of input and attempts to tokenise and syntax check before saving to program space (or executing if it is in immediate mode). I have an input pointer and output pointer which tell me where the next character to be processed from of the input is, and where the next tokenised character will go. I only advance these pointers when I am committed to taking them. Examples of where I may not be committed is checking the operator symbol table to match against multi-byte characters (the above simplified example misses operators like '<<' and '>>'.

So that means I can have an assembly routine called (say) tok_nexpr to tokenise a numeric expression. I only advance the input pointer when I am committed to consuming it, and when I see the '(', I call tok_nexpr again. On return I do a mandatory check for ')' as part of the syntax check.

So that looks pretty straightforward, once I had built all the scaffolding routines to deal with non-bracketed expressions, it was easy to extend to arbitrarily complex expressions to be tokenised.

But the runtime is more complex. Now we're not simply checking the input stream, but executing the tokenised output stream. An expression like 1 + (2 - 3) requires executing in the correct order - i.e. the brackets have to be evaluated before the rest of the expression.

I do this by maintaining an argument and operator stack when executing the numeric evaluation routine called (say) rt_neval. As the runtime is executing, any 'terms' it sees are pushed on the argument stack and any 'operators' are pushed on to the operator stack. When the routine first runs, it pushes a 'floor' value on the operator stack so it knows how much of this stack was used during this evaluation. When the routine has pushed all arguments and operators, it is ready to process. It does this simply by popping an operator, and then invoking the operator code (e.g. the '+' operator will invoke the addition routine). The operator code pops two values off the stack, does the appropriate calculation and pushes the result back on the stack. The next operators is then popped, and so on until the 'floor' value is found, when means this iteration of the routine has completed. At this point a result is on the argument stack, and the routine returns.

When a bracket is encountered, the rt_eval is invoked recursively, and when it eventually returns, a close bracket is expected (it will always be present as the tokeniser checked for this).

Having just written all this down sounds quite simple! Recursion in assembly can be a bit mind numbing - the trick is to not think too hard about the thread of execution, but concentrate on the logic and what needs to be saved and restored during recursive iteration. The beauty of recursion is that if it is got right, then the complexity that it can handle is a natural outflow of the algorithm (see picture for a couple of small examples).

So, what this all means is that I basically have a functional tokeniser and runtime evaluation routine for numbers. Strings will be easier as I will simply process string expressions left to right with no operator precedence. In fact I am feeling lazy at the moment so I may not implement it even for the numeric evaluation routine - the reason is that I can always guarantee an order of evaluation using brackets. That will avoid having to re-order arguments and operators using the well documented 'shunting yard' algorithm first published by Djikstra (if you haven't read about him, please do, he's a really key figure in computer science).

UPDATE: I have implemented operator precedence and it was rather easier than I thought it would be. When I actually started to look at this properly, I realised that I didn't need to use Dijkstra's algorithm at all. This is due to the fact that rather than pushing arguments and operators on to their stacks and then processing them all in stack order, I can simply do the following to implement operator precedence:

I am now going to busy myself finalising variable handling (I need to be able to declare and access arrays), and then I can turn to iteration and conditional structures - at the moment the interpreter runs sequentially through each line, which doesn't make for being able to make very interesting programs!

The core dflat now has additional useful commands to help with games, namely:

I'm sure you can guess where this is heading.. So I will be creating a simple game using graphics and sound to demonstrate both the hardware working and the speed of dflat.

I have built enough of the core language to try out benchmarks from the 1980s. A popular and credible magazine of the 1980s in the UK was Personal Computer World (PCW). They had a range of benchmarks to test review machines, so I have subjected my system to one of these as a trial. The results have impressed me - dflat is quite fast! See later in the blog for more information.

Discussions

Ed S wrote 09/21/2016 at 18:05 point

Yes, there must be a tension between creating a language interface which is very close to what the specific hardware does, and creating one which is general (portable) but can't make the most of any specific chip.

  Are you sure? yes | no

Ed S wrote 09/17/2016 at 15:40 point

Very interesting! Is it worth chatting to Peter Noyes and sharing notes on what commands are needed to support games programming, and what the interfaces should look like? Even if you can't agree, you might each learn from the other's ideas!

(See his Dodo post here: http://forum.6502.org/viewtopic.php?f=2&t=4257 )

  Are you sure? yes | no

6502Nerd wrote 09/20/2016 at 21:21 point

Hi BigEd.  Yes I'm aware of Peter's excellent efforts, it's very good indeed.  I will take a close look at Peter's software thoughts.  Mine choices were influenced directly by the hardware I'm using (e.g. the TM9918 sprite capability and the AY-3-8910 sound controls) and the BASIC languages I used (mainly Oric-1 and Atari 800XL).

  Are you sure? yes | no