• Acceleration

    Peter02/19/2020 at 18:48 5 comments

    A friend asked me how close I was to done. I explained that normal operation of a Forth Interpreter is (from the standard):

    1. Skip leading spaces and parse a name;
    2. Search the dictionary name space. If a definition name matching the string is found:
      1. if interpreting, perform the interpretation semantics of the definition, and continue at 1).
      2. if compiling, perform the compilation semantics of the definition, and continue at 1).
    3. if compiling, perform the compilation semantics of the definition, and continue at 1).
    4. if compiling, perform the compilation semantics of the definition, and continue at 1).
    5. If a definition name matching the string is not found, attempt to convert the string to a number. If successful:
      1. if interpreting, place the number on the data stack, and continue at 1);
      2. if compiling, compile code that when executed will place the number on the stack, and continue at 1);

    and that out of all of this process, I currently have implemented "if". 

    Actually though, that doesn't describe where I am at all. Because my Forth, like most Forths is itself implemented in Forth, this loop will almost be the last thing that I write, and almost word I implement brings me closer to being able to write this interpreter loop.

    Since the last update I've managed to get the cross-compiler / bootstrap process working for simple words. It's full of hacks, but the code it outputs is correct, and it's accelerated the rate of implementing the standard Forth words no end. Each thing I add unlocks a bunch more easy words to implement. At times it felt like I was in the middle of a great book that I couldn't put down.

    I've added ASM words for conditional and unconditional branching, which combined with a mechanism in the bootstrap Forth for marking branch targets (using assembler labels) gave me the ability to use IF, ELSE and THEN, which unlocked ?DUP and more.

    I added 0=, and the various binary bitwise words, which unlocked INVERT, 0<>, =, NEGATE, 0< and more

    I added +, which unlocked -, < etc.

    It took me more than a month before I'd written enough assembler code to fill a ROM page last year. Earlier this year I filled one in two days. Some might say that's because my encoding is not very dense, but I'd rather think of it as incredible progress! I currently have 34 of the 168 words in the core and core ext. lists, so maybe I'm really 20% of the way there. I think in truth I'm much closer. 

    I've recently taken a bit of a break from working on this project (following a few further trips down yak-shaving rabbit holes). My next step will be to get input and output working, by providing a means of wrapping the SYS functions built in to the Gigatron ROM as Forth words. Part of this is having a means of entering and re-entering Forth from vCPU, while making sure that all of the timings line up correctly. Going back to worrying about timing is one of the things that has put me off a bit. 

    Hopefully my next update here will be an explanation of how Forth words work in ROM, as I think my previous update wasn't clear, and it will be very useful for anyone who wants to look at my code.

  • Words

    Peter01/24/2020 at 08:44 1 comment

    Before Christmas I made slow but steady progress implementing various core words. I have written various stack manipulation words like ROT, SWAP, DROP and DUP, the memory words @, !, C@, C!, and the arithmetic words 1+ and 1-. I have also written the code for entering and exiting a thread: DOCOL and EXIT.

    The experience of writing all of this code in assembly has been a mixture of fun and frustration. On the one hand it's a nice exercise in that it's just creative enough to be interesting, without allowing much space for analysis paralysis. The sort of task where you can keep putting one foot in front of another. On the other hand it's so slow! I don't think I've ever got a single routine correct the first time, and often my first effort is hilariously wrong. Even with my fairly decent test approach, working out what I've done wrong always seems to involve single stepping my code, usually multiple times, which is a very laborious process. Often I'm working on this in odd half-hours, and this really tasks that feel like they should be done in one session end up taking several. I'm really glad I've aimed for good unit test coverage, otherwise I'd be lost.

    I'm now at a point where I can start to think about defining words in terms of other words. For example, a definition of NIP could be:

    : NIP SWAP DROP ;
    

     I could re-write this in Python as

    def nip():
        label("forth.core.NIP")
        docol()
        for word in ["SWAP", "DROP", "EXIT"]:
            st(lo("forth.core." + word), [Y, Xpp])
            jmp(Y, "forth.next3.rom-mode-tail")
            st(hi("forth.core." + word), [Y, Xpp])
    

    However if I follow that style, I will end up writing a whole Forth in Python-generated assembly, which isn't really the path I want to go down.

    Instead I'd like a tool that can process the Forth definition above, and issue the same Python function calls. Lots of definitions for Forth words in Forth syntax would be quite complicated, so really this needs to be a full Forth implementation in Python. Of course, I want to write as much of this in Forth as possible too, because otherwise I have to write everything twice (or perhaps even three times). None of the existing Forth in Python projects that I've seen quite for the bill.

    That Forth in Python is what I've been working on so far this year (although without spending much time on it), and I've finally got to a point where it does something. I can run simple definitions in an interactive session, and see the results, and while there are many words written in Python, the interpreter loop and several core words are written in Forth and loaded during a bootstrapping process. This includes ;, ['] and \. I think it has promise. My next step is to get it to generate Gigatron code, and NIP is the first word I shall be trying to compile.

    There's still a lot of assembly to write, but it will be good to be able to write Forth too.

  • An Exciting Development

    Peter12/26/2019 at 20:40 0 comments

    Look at the present that my wonderful wife bought for my birthday!

  • Pampered Bovids

    Peter12/07/2019 at 11:59 2 comments

    Well it happened: I went down a yak-shaving rabbit hole.

    The thinking went like this:

    • As I write more words in Python generated assembly I'm having to spend a lot of time on both getting stack manipulation code right, and counting the number of instructions. The more I lean on Python to make my code simpler, the more complicated the counting gets.
    • I'd therefore like to be able to assemble a piece of code into a buffer, and just measure the length of the buffer.
    • This means that I'm going to need to make changes to asm.py, which I've copied into my private area.
    • If I make changes to this file I need to be able to merge upstream changes, which is difficult because git doesn't really understand that the files are related.
    • I need a script to help with this: make a branch which contains unmodified upstream files, but in the right locations, and merge that into my branch. This is Yak 1.
    • If I'm going to make big changes to asm.py, I really don't want any big upstream changes to happen which I have to merge. But there was a ticket to port all of the files to Python 3, which I anticipated might be such a change.
    • So perhaps I should port asm.py myself. This is Yak 2.
    • I won't really know if my porting work is correct unless dev.py is ported too, and all of the other files it uses. Yak 3.
    • Then there's the work to refactor asm.py itself. 4.
    • Once this is done I can look at writing helpers for stack manipulation. This is not as simple as writing push() and pop() macros - at least not if I want the code to be close to as good as hand-written. Because of the way that the Gigatron instruction set works, most memory accesses use the X register, but the X register can only be set, used to produce a memory address, or incremented (and only then in combination with writes), and can never be read. To write efficient stack code you need to look at several operations at once: My hand written code sometimes updates the stack pointer variable, and sometimes doesn't, and if I can batch several writes together, I fill the stack top down (or up really, the stacks grow downwards) using the st [y, x++] instructions. So writing Python code that can work out what the appropriate instructions to use are amounts to a small optimising complier. A big five!
    • It would be really handy if someone wrote a nanopass compiler framework for Python: The rest of the herd.

    I actually shaved yaks 1 to 3. The git script was fun, and has proved useful, and taught me a lot. The Python 3 port is also nice - and is useful to the wider Gigatron project.

    I spent a lot of time vacillating between the "compiler" and the asm.py changes, and worrying that both are a waste of time. On the one hand it's kinda stupid to fee guilty about wasting time when the whole project is just for fun; why not follow an impulse to write some code if I want to? On the other hand, I would actually like to complete this at some point!

    I think a big problem for me right now is that I don't really know how hard it's going to be to get this finished without shaving the yaks, I don't know how much work these side-lines would be, and so I don't know how much benefit they'll provide. But if I push on without them and it turns out that they would have been useful, I have missed an opportunity to benefit from the effort.

    For now I've put both projects to one side, and am trying to make progress on implementing more words. More on this in the next update. I think I might well come back to the instruction counting work soon, but I'm going to go without the stack operation optimiser. I think this may come back in the form of an inliner and peep-hole optimiser to convert words written in Forth to Gigatron code. I've had an itch to write a compiler for a year now, and this might be the opportunity to scratch it.

  • NEXT steps

    Peter11/23/2019 at 00:11 0 comments

    [This log relates to the state of the project as of  commit b58f04048e080c8397b921c5705d2b3de66b05e4]

    I can't believe that I've been working on this for 24 days, with such a small amount to show for it. I guess they weren't joking when they said implementing Forth in the native instruction set would be hard! It's not that I haven't been putting any effort into it - I really have!

    I've finally finished implementing the various NEXT routines, with pretty good tests. However I'm now convinced that my original design was just bad. All of the fail-fast tests, and jumping from one routine to another leads to dispatch having a very large cost - whilst also causing me to do a huge amount of thinking just to work out what the code should do. As someone who is not good at arithmetic, and gets easily confused by pointers, this has been proving to be a challenge.

    https://docs.google.com/spreadsheets/d/1FN_2EKFwPx6gkxwS5qprr5ikTHDJjQV4124Q-Ay5Tx4/edit?usp=sharing

    My original design goal was to maximise the number of instructions a native-forth word can be without breaking the time window. The image above is from a spreadsheet I have made to find out how much the overhead is going to be, and if the system will even work in terms of the timings (does the cost of dispatch dwarf the instructions a word can have). What happens if you enter a time-slice with different length words in the 'W' register (the X-axis in the chart). In this particular case I'm examining the '---D line 0' time-slice, which is the shortest at 114 cycles, and we're assuming that we're executing code from ROM, and the word ends with a call to NEXT. All of these can be changed in the spreadsheet.

    As you can see, it works: If we run a word with 0 - 32 "real-work" instructions, we'll finish it, dispatch to the next word, and possibly start another word all within our window. If our word is between 33 and 48 instructions, we will load the W register with the address of the next word, but won't dispatch to it (the steps are various points it can fail along the path), If our word is 48 to 82 cycles long we will complete the work, and queue the dispatch, but won't attempt it. Greater than that, and we can't do the required book keeping, so we don't enter the word.

    I definitely want to come back to this area of the code, and get that big red stripe narrower, but for now, it's going to be good enough.

    Next DOCOL and EXIT. I'm thinking of writing some Python utilities to do the code-counting for me - doing it by hand is pretty tedious!

    P.S. Spreadsheets are a trap. It's so easy to get started, but a nightmare to keep track once they get a little bit big.

  • Lost a day

    Peter11/14/2019 at 08:35 0 comments

    I've been making slow but steady progress over a little while, having implemented next1, next2 and next1-renter. I've even written pretty good unit tests for some of them. This has been a hard process, getting the tests right is harder than getting the code right!

    Yesterday it occurred to me that I could string all the routines together, saving on the jumps. I tried it out, and while neat, it actually made the code slower. So then I decided to go the other way - inline everything. This makes the code faster in some cases, but massively changes the invariants of the routines, meaning the tests need to be rewritten. This whole business took most of my available time for a day.

    So now I have a dilemma: carry on refactoring, for only possible benefit (let's face it, there's no real world benefit to any of this) or go back to my suboptimal code, and push forward with that. Neither sounds good. I also have come up with a few more approaches to saving time in dispatch.

    This is what I meant about shaving yaks!

    Part of the problem is that I don't really have a good way of evaluating any solution right now.

    I'm going to commit my rewritten code as a branch, and revert.

    I'm the forum Marcel posted a vCPU implementation of parts of a Forth kernel, and it's remarkably small compared to mine. I intend to post in the forum when my version reaches parity with his.

  • Up to date

    Peter11/05/2019 at 13:56 0 comments

    [This post brings me up to date with the work I've done so far, and corresponds to the state in commit 0e4b928083ae69bb6ae577197ac016e93a77e80e]

    I've documented my design for the NEXT routines, which I've committed to Github here. It's a little sketchy, and I can already see mistakes. I haven't run, or even assembled any of the code yet, but hopefully I haven't made up any of the op-codes, even if some of my assembly syntax is invented! I think the big ideas are clear and I shan't try to explain it again here. I originally wrote that document to post to the Gigatron forum, but I decided it was a bit long to post at this stage.

    One thing I will say about that design is that I'm really disappointed that the ROM-mode version of NEXT3 is so expensive. I'd really like running threads in ROM to be cheaper than running threads in RAM. I'm aware that I can encode jumps inline in my instructions, which would vastly speed things up, by removing the double-jump dispatch loop, however it would add 50% to the space usage. I think the current design will probably work, and I can switch to inline jumps later if it turns out that speed is more valuable than space. Also, I really love the double-jump trick - it's so cool!

    I'd like to talk a little more about the general approach I'm planning to follow in developing my Forth. I think it can be summarised as:

    1. Try to avoid shaving any yaks.
    2. Prefer automation to manual work, except where such automation would conflict with the First Law.
    3. Prefer writing Forth to Python, except where writing Forth would conflict with the First or Second Law.
    4. Prefer using Python to write assembly to writing assembly by hand, except where writing Python would conflict with any of the other laws.

    In practice, I intend to lean very heavily on Python, following the existing approach of using it as "macro assembler", but also trying to use it to unit test my code, and probably even having a Forth compiler in Python for generating embedded Forth threads.

    The first 'law' about not shaving yaks is important. I find it easy to get bogged down in minor details and lose velocity. I'm going to try to enjoy, and even celebrate kicking cans down the road. To that end, here are some yaks I have left unshaven so far:

    • The various Python modules / scripts for generating the ROM are in Python 2. I'd rather be using Python 3, and they don't look hard to port. Not the task at hand though!
    • I'm clearly going to need to add to the ROM, but all of the space is used by vCPU applications and data. Am I going to work out the conflicts? No, I'll cross that bridge if I come to it. I've copied the dev.py file and the bare minimum dependencies into my contrib area. If I prove the concept, and want to merge later, I'll work out what I've broken.
    • I want a Python Gigatron emulator, in order to unit test my code from Python. A Python implementation would be straightforward, but not an insignificant amount of effort. Instead I wrapped gtemu.c as a Python extension using cffi.
    • I should probably have some proper build automation, at least a Makefile, but once again, I'm on Windows, no make installed, not going to bother. A bad PowerShell script will have to do.
    • I'm developing on Windows, and the appropriate C compiler for Python 2 extensions didn't like compiling gtemu.c (it seems that by default it interprets anything with a .c file extension as C '89, which wanted variables declared at the start of blocks, with no inline initialisation). I could have easily lost hours trying to work out the compiler flags to fix this. Instead I'm treating it as C++. Some other minor issues were the absence of stdint.h, which I fixed with typedefs copypasted from StackOverflow, and needing some explicit casts from void *. It seemed like the path of least resistance.

    The CFFI wrapping for gtemu.c is pretty neat actually; I hadn't used CFFI before. Here's an example of using it from iPython:

    In [1]: import _gtemu
    
    In [2]: f_rom = open('dev.rom', 'rb')
    
    In...
    Read more »

  • NEXT

    Peter11/04/2019 at 22:38 0 comments

    My knowledge of Forth originally comes from the really fascinating jonesforth by Richard W.M. Jones, which I first saw through Lambda the Ultimate back in 2007. It takes the form of two files, one assembly file, which provides just enough machinery to stand up a Forth runtime, and one Forth file which builds upon it to provide a usable (although almost certainly incomplete) environment. The code is written in a literate style, being so heavily commented as to be at least as much a document as a program. I strongly recommend checking it out for an interesting read.

    This style of writing a small assembly kernel in order to bootstrap a Forth system is apparently a common approach, and it's this that makes me think that a native Gigatron Forth might be a tractable endeavour. Hopefully I can fairly quickly get to the point where I'm copying existing Forth implementations of core words instead of writing lots of ASM by hand.

    I'm also alive to the scepticism of the other posters on the Gigatron Forum, and must acknowledge that I'm likely to fail. This is also the first truly low-level programming I've ever done.

    I've also read (many times) the Moving Forth papers by Brad Rodriguez, which explain (amongst other things) the difference between a few important implementation strategies, particularly Direct Threaded and Indirect Threaded code (jonesforth follows the more traditional indirect threaded code model).

    I'm not going to try to re-explain any of what either of these excellent sources cover. To follow what is to come, you'll probably want to be familiar with Forth terminology like 'thread', 'word', etc. Ideally not too familiar though, as I'm likely to misuse the terms - I'm no Forth programmer myself!

    What both jonesforth and Moving Forth make clear is that the heart of a Forth implementation is 'NEXT', the code which dispatches to the next word in the currently executing thread, and moves the interpreter pointer (IP) forward. NEXT for the Gigatron could never be as short as the two instructions in jonesforth, because the instruction set is not so rich as i386, but ours will be a lot more complicated because of the timing constraints.

    My Forth is effectively going to be a third virtual CPU for the Gigatron, and as such must follow the same rules as the vCPU and the v6502 - which are basically that you are told how much time to run for (in ticks - two clocks, which means two instructions), and you must run for that long, and not an instruction more or less. In the vCPU this is done by the dispatch loop (itself called NEXT - no coincidence, I'd guess) seeing if there's time for the longest possible op-code to complete. However we won't know what the longest 'native' Forth word is (well perhaps we could, but it could be pretty long) and we don't really want to scatter all of our words with book-keeping and continue/abort decisions. So my plan is to have each native word describe it's worst-case runtime before starting, and NEXT will have to decide whether to enter it, or not. Upon completion the word will report the actual time taken (they could be different where there is code that loops or branches). This is similar to how SYS routines work. All of this book keeping and decision making will make NEXT a lot more costly, and the shortest call to the vCPU is 36 ticks - so there isn't much time to get much of anything done!

    There's another fly in the ointment: to be usable as an interactive environment we need to be able to compile Forth code to RAM, and to make use of other Forth code in RAM, but native code needs to live in ROM, and we'll certainly want compiled Forth code in ROM as well - so sometimes we'll be executing a thread in RAM where the addresses are to other threads in RAM, sometimes we'll be executing a thread in RAM, but the addresses are to words in ROM, and sometimes we'll be executing threads in ROM, so our NEXT implementation needs to keep all of this straight.


    I've sketched out a design for all of this, which I'll post about...

    Read more »

  • Introduction

    Peter11/04/2019 at 21:15 0 comments

    I'm writing this retrospectively, as I've already done some work on this project.

    I've been taking a lot of interest in the Gigatron project since I saw a video about it on YouTube, and then watched the Hacker Hotel video explaining its fascinating architecture - I don't own one yet, but I'm hoping to have some money after Christmas!

    On the Gigatron forum a user posted about porting Forth to the Gigatron. I've also been interested in Forth, and stack-based language runtimes for a while, but haven't quite had the impetus to write one myself. The original post asked if people were interested, and questioned whether the vCPU, or v6502 was the better choice for implementing it, whilst leaning towards the v6502. I replied that I thought that the native 8-bit Gigatron instruction set would make for the cooler hack. Marcel and monsonite (the original poster) both thought that this was too hard. But I couldn't stop thinking about it... so here I am, setting out to write a ROM based native Forth for the Gigatron.

    Who knows if this will result in any useful code, but I already feel like I've learned a few things.

    The next update will be on what I've got so far, and my general approach.