Close

The Road Much Less Travelled.

A project log for Prometheus A.I.

One set of rings that will control all.

glgormanglgorman 07/08/2022 at 05:550 Comments

I cooked up an Eliza-based Pascal source tokenizer and tried using it to see how good it was (is) at doing some of the initial steps in converting the Pascal compiler to C++.  Although the initial results seem a bit cringe-worthy, they are not a complete disaster either.  So, I got really aggressive in creating a debugging environment for the Eliza-based tokenizer, as well as the original and these results together are looking quite promising.  First, a glimpse of the Eliza-based method.

void PASCALCOMPILER::SOURCE_DUMP ()
{
    ELIZA eliza;
    text_object source;
    char *buff1, *buf2;
    int line;
    line = 0;
    if (SYSCOMM::m_source==NULL)
    {
    WRITELN(OUTPUT,"NULL source file");
    return;
    }
    else if ((*SYSCOMM::m_source).size()==0)
    {
    WRITELN(OUTPUT,"Empty source file");
    return;
    }
    else do
    {
    buff1 = (*SYSCOMM::m_source)[line];
    source = buff1;
    buf2;
    eliza.process = source;
    eliza.pre_process (pascal2c);
    eliza.process >> buf2;
    WRITE(OUTPUT,buf2);
    delete buf2;
    line++;
    }
    while (buff1!=NULL);
}

The mostly complete source for this mess can be found of course in the GitHub repositories for this project and will be updated regularly.  Be very afraid.  Use at your own risk.  Guaranteed to contain LOTS of bugs.  On the other hand - creating a bunch of debugging code that inspects each symbol as it is parsed, and which selects for things like whatever is found starting with every occurrence of the keyword PROCEDURE and continuing until the first SEMICOLON encountered thereafter - yields a very promising result - which looks (in part) like this.

12762: PROCEDURE
12763:  "ASSIGN"
12764: (
12765:  "EXTPROC"
12766: :
12767:  "NONRESIDENT"
12768: )
12769: ;

12859: PROCEDURE
12860:  "GENJMP"
12861: (
12862:  "FOP"
12863: :
12864:  "OPRANGE"
12865: ;

13012: PROCEDURE
13013:  "LOAD"
13014: ;

13017: PROCEDURE
13018:  "GENFJP"
13019: (
13020:  "FLBP"
13021: :
13022:  "LBP"
13023: )
13024: ;

13048: PROCEDURE
13049:  "GENLABEL"
13050: (
13051: VAR
13052:  "FLBP"
13053: :
13054:  "LBP"
13055: )
13056: ;

13078: PROCEDURE
13079:  "PUTLABEL"
13080: (
13081:  "FLBP"
13082: :
13083:  "LBP"
13084: )
13085: ;

13175: PROCEDURE
13176:  "LOAD"
13177: ;

13469: PROCEDURE
13470:  "STORE"
13471: (
13472: VAR
13473:  "FATTR"
13474: :
13475:  "ATTR"
13476: )
13477: ;

13560: PROCEDURE
13561:  "LOADADDRESS"
13562: ;

Now without taking another digression into a discussion of the meaning of the word SELECT, and what might mean in the context of relational databases, it should be easy to see how if all we were to do is to tokenize the input and then select sub-sections according to certain properties, then obviously - this leads to something that looks like it might be handled quite easily by some kind of #define TYPEGLOB_REORDER (A, B, C, ...) macro.  Even if I am not proceeding at this point with trying to do a pure preprocessor macro-based language scheme.  Somewhere, over the rainbow, maybe someday?

Discussions