Close
0%
0%

Prometheus A.I.

One set of rings that will control all.

Similar projects worth following
What if we could add a kind of "digital DNA", to a classic language like UCSD Pascal, with an eye toward developing an efficient platform that will facilitate the creation of projects that integrate feature sets from several other programming languages and styles within a single unified framework that integrates the functionality of multiple devices within a cohesive environment. In effect, such a framework might allow an aspiring robot designer to write an application in a high-level language such as PASCAL, and then cross-compile that application to another intermediate language such as a specialized variant of LISP, which could then either be implemented in the form of a C++ library which provides some of the features of LISP or else another meta-compiler might be used to convert the intermediate representation to run on a microcontroller such as a Propeller 2 using the built-in FORTH interpreter or else into the native assembly, or even traditional UCSD p-code.

Several approaches are frequently taken when developing projects that involve some type of AI.  In the traditional approach, interaction with a simulated intelligence can be produced by combining simple pattern matching techniques with some type of scripting language which in turn provides a seemingly life-like experience, which within some contexts can be highly effective, even if only up to a certain point.  This is the approach taken by classic chatbots such as ELIZA, PARRY, MEGAHAL, SHURDLU, and so on.   Whether this type of AI is truly intelligent might be some subject for debate, and arguments can be made both in favor of, as well as against claims that such systems are in some way intelligent, on the one hand - even though nobody can reasonably make any sort of claim that such systems might in any way be sentient - yet WHEN they work, they tend to work extremely well.

Most modern attempts at developing AI as of late seem to be focused on efforts to develop applications that more accurately model some of the types of behaviors associated with the types of neural networks found in actual biological systems.  Such systems tend to be computationally intensive, often requiring massively parallel computing architectures which are capable of executing billions of concurrent, as well as pipelined non-linear matrix transformations so as to perform even the simplest simulated neuronal operations.  Yet this approach gives rise to so-called learning models that might not only have the potential to recognize puppies, etc. but why not build networks that can try to solve more esoteric problems like certain issues in bio-molecular research, and mathematical theorem proving, etc.

Thus, the first approach seems to work best for problems that we know how to solve, and this method, therefore, leads to solutions, that - when they work - are both highly efficient, as well as provable, with the main issues being the amount of work that goes into content creation, as well as debugging and testing.

The second approach seems to offer the prospect of allowing for the creation of systems that are arguably crash-proof, at least in the sense that it should be possible to build simulations of large neural networks, that are just massively parallelized as well as pipelined matrix algebraic data flow engines, which from a certain point of view, is simplicity in and of itself.  So that would of course seem to imply that from at least one point of view, the hardware can be made crash-proof, that is within reasonable limits, even if an AI application running on such a system might hang from the point of view of the case where the proposed matrix formulation according to some problem of interest fails to settle on a valid eigenstate.

So, let's invent a third approach, according to the possible introduction of some type of neural network of the second type that can hopefully be conditioned to create script engines of the first type.  Not that others haven't tried doing this with so-called hidden Markov models which concordantly will just as often introduce some kind of Bayesian inference to some hierarchical model.  Thus, there have been many attempts at this sort of thing, often with interesting, even if somewhat, at times nebulous results, i.e., WATSON, OMELETTE.  So, obviously - something critical is still missing!

Now as it turns out, the human genome consists of about 3 billion base pairs of DNA, each of which encodes up to two bits of information - which might therefore fit nicely in about 750 megabytes for a single set of up to 23 chromosomes, if it can be stored that is, in a reasonably efficient, but uncompressed form.  Now if it should turn out that 99% of this does not code for any actual proteins, then it might very well be that all of the actual information needed to encode the proteins that go into every cell in the human body, well that information might only need a maximum of about...

Read more »

prometheus.pdf

A draft copy of the text of the Description, Details, and Project Logs for this project in pdf form. Enjoy!

Adobe Portable Document Format - 225.17 kB - 07/14/2022 at 12:12

Preview
Download

pascal_procedures.pdf

A somewhat more condensed version of the list of the procedures in the original UCSD Pascal compiler source code. Numbers next to the procedure name are not line numbers, rather they are the symbol numbers based on a version of the tokenizer that captures each symbol's index number as it is parsed. Quotation marks are added when the debugging code reports on the name of a variable, or string literal.

Adobe Portable Document Format - 9.59 kB - 07/08/2022 at 07:57

Preview
Download

UCSD Pascal 1.5 Compiler.pdf

This is the source code for the original UCSD Pascal 1.5 Compiler which was released for free, non-commercial and-or educational use by the University of California, San Diego sometime around 2003. I have taken the original text file from the UCSD distribution and added line numbers for reference purposes, as well as printing it out in pdf format.

Adobe Portable Document Format - 221.37 kB - 06/27/2022 at 02:09

Preview
Download

  • Beyond the Labyrinth - Though the Looking Glass:

    glgorman01/09/2023 at 04:48 0 comments

    There is no other way that I can think of to describe it.  It goes like this.  I managed to get simple assignments working with the RECORD type in the C++ version of the Pascal compiler, but what an uphill battle that was, and yet I am still facing an error  <158> "No such variant or record" when the C++ version of the compiler tries to compile the original Pascal source.  So I began implementing completely new versions of the "structure" and "identifier" RECORD types, which I am integrating into my FRAME-LISP library.  In the case of the "identifier" class, I have made some progress with using the so-called "curiously recurrent template pattern" method so as to get the identifier class to inherit from as well as be encapsulated by my bTreeType<X> template class.  Then with the structure class, I am also in the process of using variations on the concept of trying to create a class factory approach static polymorphism to create class hierarchies that can be both nested as well as inherited in order to support suitable class types for such necessities as object::scalar, object::pointer, etc.

    So far so good, or not so good:

    Really, it's probably just another case of mis-nesting of curly braces somewhere, that I need to ferret out, either by finishing my Eliza-based code analyzer; or else I need to do a complete heap walk, in order to see if the actual problem is in declaration part.  The best way to do a proper heap walk, of course, is to completely re-implement all of the data structures in some parallel context.  This is why I need to implement an object factory, and so on.  And then, like, isn't this all of a sudden, no maybe, I have actually been down this road before, but wait a moment, what exactly is THIS?!?

    So I open up the Pascal source for the compiler as a text file in visual studio, while the debugger is stopped while the same line of code is being compiled by the C++ version so that I can try to figure out what exactly is the problem, and then I just so happen to position the mouse over the name of an identifier, not in the C++ code that is being debugged, but in the Pascal code that is being compiled, and - say what?  Visual Studio recognizes the symbol INTPTR and I get this popup list!  Wait a minute!  Maybe if I was debugging a fence post problem in some kind of array indexing example, and maybe if I opened "The Adventures of Tom Sawyer", in Visual Studio, while debugging an application that used the label "fence" as a parameter for some range type, then I would get some kind of pop-over, because Visual Studio is just guessing that something might be relevant.  Yeah, so beyond the Labyrinth and Through the Looking Glass, indeed.

    Now, this gets me thinking about equivalence classes, and some philosophical reasoning regarding the question "What exactly is an analogy?" Is a proper analogy formed when two or more ontological constructs share at least one equivalence class, in at least one context - irrespective of whether that particular equivalence is strong or weak?  Yeah, I think I know where this train is headed. 

    Yet for whatever it's worth, maybe this is a good time to mention that the very first PASCAL program that I ever wrote, besides some Hello World demo was a program to help create strategy tables for the game of Blackjack.  Now as it turns out, not only do I STILL have my original Apple II floppies, but at some point a long, long time ago, I took the time to copy at least some of that stuff over to a Macintosh, of which, of course, at least some of THAT material got archived onto a PC format, thanks to Windows NT 4.0 Macintosh file sharing over Ethernet.

    So I finally have gotten to the point that I can try compiling some VERY old code that I wrote - yes THAT long ago for UCSD Pascal on the Apple II!  Unfortunately, the new compiler isn't fully functional quite yet.  However, I am pleased to report that...

    Read more »

  • Curiouser and Curiouser: But not quite that Curious - Yet

    glgorman12/31/2022 at 02:26 0 comments

    For those who actually read this stuff, you should know by now that I always try to come up with a catchy title for my log entries, that just as often as not reflects on, or hints at some famous movie quite, or alludes to an excerpt from some song lyric, or else quotes, or perhaps misquotes some other famous piece of literature.  So why should this post be any different?  Especially if you know that there is a template pattern method that is sometimes used in C++ to allow member functions in an instance of a base class to explicitly call member functions contained in a derived class, without requiring the use of virtual functions - and this method is sometimes referred to as being the "curiously recursive template pattern."

    Well, I'm not quite doing anything with that, just quite yet - but something got to my curiosity nonetheless, so that, well something just might be leading in that direction.  Earlier I was messing around with a C++ port of Eliza, and testing a template-based method for constructing binary trees, which would be nice to have for use with lots of things.  So I wrote a quick hack that stores the Eliza strings in pascal-like strings which are usually just fixed-length packed arrays of char with size 0..255.  Yet rather than allocate a binary tree node that contains a pointer to the data object, in this case, a "string", I decided to encapsulate the data object and the tree node in a single monolithic object, even if I am not using the curiously recursive pattern, quite yet.  So the existing code looks like this:

    bTreeType<pstring> *ELIZA::make_tree (char **list)
    {
        size_t sz1 = sizeof (bTreeType<pstring>);
    //    69362 bytes = p-system SANDBOX
        size_t sz2 = sz1*256;  
        bTreeType<pstring> *ptr = (bTreeType<pstring>*) sandbox::allocate_p (sz2);
        void *ptr2;
        bTreeType <pstring> *bTree[256];
        int i;
        for (i=0;i<256;i++)
        {
    //    if the pointer arithmetic works the way it is supposed
    //    to, we should get an array which is 69632 bytes 
    //    which contains 256 binary tree nodes - each of which
    //    directly encapsulates an array of 256 bytes of char  
            ptr2 = (void*) (ptr+i);
            bTree[i] = new (ptr2) bTreeType<pstring>;
            if (i<114)
                bTree[i]->m_pData = list[i];
        }
        return ptr;
    }

    And as described in an earlier post, it appears to be working as it is supposed to.  So now I am looking into the possibility of doing a code merge which could be used to combine the binary tree structures that are used in the UCSD pascal compiler to store identifiers, for example - with the generic binary tree template methods that I wrote some 27 years ago.  So the first step as I figure it is to replace all calls to operator new in the compiler source with calls to identifier::allocate, which in turn now calls the placement version of operator new - so that I can begin experimenting with performing all major data structure allocations in a special Pascal sandbox, which is also in anticipation of wanting to make a code base that is more emulator or microcontroller friendly since in those situations we will have an explicit requirement for managed allocations.  Thus, the identifier::allocator(s) now looks like this:

    identifier *identifier::allocate ()
    {
        identifier *id;
        void *pascal_heap = NULL;
        id = new (pascal_heap) identifier;
        if (identifiers::m_bTracing==true)
            debug1 (id,true);
        return id;
    }
    
    identifier *identifier::allocate (IDCLASS idclass)
    {
        identifier *id;
        void *pascal_heap = NULL;
        id = new (pascal_heap) identifier;
        id->KLASS = idclass;
        if (identifiers::m_bTracing==true)
            debug1 (id,true);
        return id;
    }
    
    identifier *identifier::allocate (char *str, STP ptr, IDCLASS idclass)
    {
        identifier *id;
        void *pascal_heap = NULL;
        id = new (pascal_heap) identifier;
        if (str!=NULL)
            strcpy_s(id->NAME,IDENTSIZE,str);
        id->IDTYPE = ptr;
        id->KLASS = idclass;
        if (identifiers::m_bTracing==true)
            debug1 (id,true);
        return id;
    }

     And for now - the version of placement new that I added to the identifier...

    Read more »

  • And a Very Merry (Blaise), Christmas to You Too.

    glgorman11/28/2022 at 03:32 0 comments

    As previously discussed, the C++ port of the UCSD Pascal compiler now generates something that looks like a binary image, even though I don't have a p-machine to run it on, quite yet.  (Easy weekend project?) Yet that doesn't mean that these waters aren't swarming with types (hopefully) that wouldn't enjoy a core dump, or a hex dump or two., even if just for proof of concept.  Right now I think that simple assignments BEGIN-END, CASE,  FOR, GOTO, IF, REPEAT, and WHILE are working, but accessing members of RECORD types, and use of the WITH statement is still broken.  But in any case: here is a simple test program for proof of concept.

    (* TEST PROGRAM *)
    (*$T+*)
    (*$U-*)
    
    PROGRAM TEST;
    
     TYPE
        DATETIME = RECORD
            YEAR, MONTH, DATE: INTEGER;
            HOUR, MINUTE, SECOND: INTEGER;
        END;
     
     VAR HUEY, DEWEY, LOUIE: REAL;
        INKY, BLINKY, BASHFUL: INTEGER;
        LUKE, R2D2, C3PO, OBIWAN: BOOLEAN;
           
     PROCEDURE tree;
     
      BEGIN
        WRITELN(OUTPUT,'Where there are birds, there should be trees.');
        WRITELN(OUTPUT,'And hopefully, there should also be eggs!');
        WRITELN(OUTPUT,'A three egg omelette would be nice.');
        WRITELN(OUTPUT,'Ah, but what if there was a snake in the tree?');
      END;
     
     PROCEDURE stormy;
     
     VAR clouds, wind, rain: BOOLEAN;
     
     BEGIN
        (* In the "Big Inning" God createad Baseball! *)
        INKYY := 1;
        BLINKY := 2;
        BASHFUL := -3;
        
        WRITELN(OUTPUT,'So there I was ... ');
        WHILE INKY<5 DO
        BEGIN
          CASE INKY OF
            1: WRITELN(OUTPUT,'It was a dark and stormy night.');    
            2: WRITELN(OUTPUT,'And as the swamp thing staggered from');
            3: WRITELN(OUTPUT,'the crypt .... ');
            4: WRITELN(OUTPUT,'Suddenly there was a need for words.')
         END;
         rain := true;
         INKY:=INKY+1
       END; 
     END;
       
     PROCEDURE oasis;
     
        VAR    msg1, msg2, msg3, msg4,
        msg5, msg6: PACKED ARRAY [1..255] OF CHAR;
    
     BEGIN    
        msg1 := 'Even as the vultures circled overhead';
        msg2 := 'I pressed onward into the night';
        msg3 := 'Refusing to give up hope';
        msg4 := 'The birds had to have come from somewhere, right?';
        msg5 := 'Right!';
        msg6 := 'And then suddently ... an oasis!'; 
     
        WRITELN(OUTPUT,msg1);
        WRITELN(OUTPUT,msg2);
        WRITELN(OUTPUT,msg3);
        WRITELN(OUTPUT,msg4);
        WRITE(OUTPUT,msg5,' ');
        (* WRITE(OUTPUT,' '); *)
        WRITELN(OUTPUT,msg6);
     END;
     
     PROCEDURE loopy;
     
     VAR I, J, K:INTEGER;
     
      BEGIN
        I:=100;
        IF I=100 THEN
            WRITE(OUTPUT,I);
        FOR J:=0 TO 10 DO
        BEGIN
            WRITE (OUTPUT,'This is a test: ');
        WRITELN (OUTPUT,J)
        END;
        HUEY := HUEY+1;
        K := I+J;
        (* End of This is a test *)    
     END;
     
     BEGIN
        HUEY := 1;
        WRITELN(OUTPUT,'Stormy');
        stormy;
        WRITELN(OUTPUT,'Oasis');
        oasis;
        WRITELN(OUTPUT,'Tree');
        tree;
        WRITELN(OUTPUT,'Loopy --- loop: ',HUEY);
        REPEAT
            loopy;
        UNTIL HUEY=5;
        WRITELN(OUTPUT,'This is only a test');
     END.
     

     And here is a hex dump of the compiler output.   Warning - there are still LOTS of bugs.  Enjoy!

    00000000: b60103a62d5768657265207468657265 --> "?!#ª-Where there"
    00000010: 206172652062697264732c2074686572 --> " are birds, ther"
    00000020: 652073686f756c642062652074726565 --> "e should be tree"
    00000030: 732e00cd0013b60103cd0016b60103a6 --> "s. ? 3?!#? 6?!#ª"
    00000040: 29416e6420686f706566756c6c792c20 --> ")And hopefully, "
    00000050: 74686572652073686f756c6420616c73 --> "there should als"
    00000060: 6f20626520656767732100cd0013b601 --> "o be eggs! ? 3?!"
    00000070: 03cd0016b60103a62341207468726565 --> "#? 6?!#ª#A three"
    00000080: 20656767206f6d656c6574746520776f --> " egg omelette wo"
    00000090: 756c64206265206e6963652e00cd0013 --> "uld be nice. ? 3"
    000000a0: b60103cd0016b60103a62e41682c2062 --> "?!#?...
    Read more »

  • My "First" Pascal Program?

    glgorman08/18/2022 at 02:34 0 comments

    Well, you know what I mean.  I mean my first Pascal Program with the new compiler!  Sort of.  In any case, I am finally generating something that at least looks like a binary image, even if there are still quite a few bugs - and even that should be taken as an understatement.  Yet I am VERY close to having a p-code image that might run on genuine Apple II hardware, or on an emulator - or, wouldn't it be nice to have Apple II GS emulation on the Parallax Propeller P2?  Even though I will need to write a p-code interpreter for that, that should be pretty easy by comparison.

    There is after all that other story about the other three bears.  The one that says that if a bear ever confronts you you should consider the following advice:  If there bear is brown, lay down.  If the bear is black, fight back.  If the bear is white, good night!   Remember, it's your life, and quite possibly your death. 

    As I said, there is still lots of work to do, and lots of bugs, so this code is still too preliminary to assign a version number.  But you can find the code on GitHub anyway.  BE VERY AFRAID,

  • What answer would please you the most?

    glgorman08/08/2022 at 02:52 0 comments

    I found the original source code for Eliza, as it was written in MAD SLIP.  Unfortunately, when I tried OCR in Adobe Acrobat - the text is hopelessly garbled.  So that the original Eliza script looks something like this - if you let the OCR pretend that it knows what it is doing.

    (DOES IT PLEASE YOU TO BELIEVE THAT) 
    C::O YO!..: SOMETI.'·~ES i..JI SH YCU H::R::- 4) 
    (PE~~AP$ YOlJ VlO~LO LIKE TO 3E 4)) 
    ( { 0 ! D YJU) {iil-lY DO YOU Tf-ll \K 1 3 y~~) 
    (YO:...: LIKE TO THINK I 3 YOU - CO~~ 1 T Y-2\...}

    Now I am not going to jump on the bandwagon that is going around, (just yet) about some notion that because of AI computers have invented their own language, which scientists can't understand.  Really, it's just garbled scanned text.  Or at least it is for now.  Yet what if we invented a programming language where there is no such thing as a syntax error, so that anything will compile, and possibly execute?  Obviously, if Conway's Game of life is known to be Turing complete, then, at least in principle, it should be possible to implement some type of ostensibly sentient A.I. that, might work like ELIZA or GPT-N, that is when equipped with some kind of learning mode that allows the user to explain to it things like the fact that "a boat might be a kind of thing that fills a hole in the water that you throw money into.", and so on.  This is already led to some dubious results, for others, which I won't go into further - here, just yet.

    Instead, I have taken to the task of retyping the original MAD SLIP source code for Eliza into a fresh text file, hopefully with far fewer garbled characters than what Adobe seems to be able, or otherwise completely unable to do.

    CHANGE MAD
              EXTERNAL FUNCTION (KEY,MYTRAN)
              NORMAL MODE IS INTEGER
              ENTRY TO CHANGE.
              LIST.(INPUT)
              V'S G(I)=$TYPE$,$SUBST$,$APPEND$,$ADD$,
              1$START$,$RANK$,$DISPLAYA$
              V'S SNUMB = $ I3 *#*$
              FIT=0
    CHANGE    PRINT COMMENT $PLEASE INSTRUCT ME$
              LISTD.(mtlist.(INPUT),0)
              JOB=POPTOP.(INPUT)
              T'H IDENT, FOR J=1,1, J.G. 7
    IDENT     W'R G(J) .E. HOB, T'O THEMA
              PRINT COMMENT $CHANGE NOT RECOGNIZED$
              T'O CHANGE
    THEMA    W'R J .E. 5, F'N IRALST.(INPUT)
             W'R J .E. 7
                  T'H DISPLA, FOR I=0,1, I .G. 32
                  W'R LISTMT.(KEY[I]) .E. O, T'O DISPLA
    READ(7)          S=SEQDR.(KEY[I])
                  W'R F .G. O, T'O DISPLA
                  PRINT COMMENT $*$
                  TPRINT.(NEXT,0)
                  PRINT FORMAT SNUMB,I
                  PRINT COMMENT $ $
                  T'O READ(7)
    DISPLA        CONTINUE
                  PRINT COMMENT $ $
                  PRINT COMMENT $MEMORY LIST FOLLOWS$
                  PRINT COMMENT $ $
                  T'H MEMLST, FOR I=1 , 1, I .G. 4
    MEMLST        TXTPRT.(MYTRAN(I),0)
                  T'O CHANGE
              E'L

    Now actually, there are about seven pages of this stuff, that I am in the process of cleaning up, and which of course adds yet one more TODO to be done to my bucket list of things to be done, and that would of course be - why not implement "just enough MAD-SLIP" to run on an Arduino or a Propeller, so we can finally have the "original", or at least feel a little closer to the metal than some java-script implementations, no matter how nice, etc.  Apparently, the IBM 7090 had something like 64K of core memory, arranged as 32768 16-bit words.  So that sort of thing seems reasonably doable - at least as far as memory footprint requirements are concerned, and not according to whether there should be any need to have actually compile and run this on an IBM 7090 in emulation, even if stuff like that exists in Hercules.  No thank you, at least not yet.  I don't really like anchovies either.

    Yet here we can easily see where ELIZA had a mode, where she could say "Please Instruct Me", or "Change not recognized?"  So there WAS a learning mode!  Or there was one in the works!  Now the challenge begins to take on a different flavor, and that is not just to implement LISP, MAD SLIP, Pascal, and C, and so on - in a suitable microcontroller environment - but to REALLY "Hack it back", by perhaps getting "just enough Turning completeness" into the C/C++ pre-preprocessor, or EBNF lexer,...

    Read more »

  • Through the Maze, Down the Rabbit Hole, Into the Labyrinth

    glgorman07/18/2022 at 19:24 0 comments

    It goes something like this.  

     PASCALCOMPILER::THREAD_ENTRY gets called.  That calls  PASCALCOMPILER::COMPILER_MAIN, which leads to PASCALCOMPILER::BLOCK, which in turn brings us into PASCALCOMPILER::BODY which gets us into BODYPART::MAIN via the "rabbit hole" of doing a reinterpret_cast on the compiler object which in turn was Frankenstiened together via a custom allocator, rather than with the regular constructor object, which shouldn't be possible if the inheritance is based on a set of virtual base classes, but it does seem to be working, for now.  Then we cross the Rubicon into the Labyrinth that consists of mutually recursive calls back into DECLARATIONPART::MAIN, then back up to PASCALCOMPILER::BLOCK (!),, followed by a trip into PASCALCOMPILER::DECLARATIONS, which somehow finds DECLARATIONPART::MAIN again, where it finally falls to the center of the earth looking for a symbol that it doesn't find, via a SKIP loop - finally emitting ERROR 18 "Error in Declaration Part".

    O.K., so now we have proof of concept: Pascal-style mutually recursive nested functions in C/C++, even though "local procedures are illegal" in C.

    Time to say oh, la, la! and order a deluxe pizza?   Maybe.  Even though it is still quite a bit far off before I will be generating a binary that will run on the Propeller, Arduino, or a TTL-NOR computer. for that matter.  Yet there is something interesting that comes to mind.   Most of the code conversion was done in a word processor, using find and replace and then manually editing the prototypes, as previously discussed.  So I know there are other plenty of places where some block might not have gotten an extra left or right curly brace to properly match up and properly format some DO WITH this WHILE that mess, that might also have nested cases, etc.  So I simply added or deleted braces by eye, often without looking at the original source, just to get it to compile!  I mean, so whet?  Just give it the brains of Abby Normal, right?  Maybe Eliza could do a better job.

    Maybe Eliza COULD do a better job!  Right now Eliza is pretty good at substitutions, but not so good at permutations, and re-arrangements.  Yet clearly something a little more sophisticated than diff and patch is needed, and yet grep isn't quite the right answer either.  Obviously, one could grep out the FOR loops, and the IF THENS, etc, and then one could compare the original code with the translated code, but I don't think that grep knows how to do that.  That's more like the task of running two or more lexers side by side, each of which is somehow an "expert" at some part of the structure while ignoring constructs that it doesn't understand.

    Sort of like what GPT-3 tries to do - predict the next word in a sentence, based on an analysis of 100's of gigabytes of text, whether it is standard English or code.  Then there is also Microsoft's GitHub co-pilot - which I haven't tried yet - I don't know if it is even up and running, and in any case - can it use its AI to compare a mostly correct but a bit buggy Pascal source with the brand new but still seriously broken C version of the same program?  Probably not.

    Yet - I knew that there was a reason for creating THIS mess, when I did, back around mid-summer 1997.

    class text_object
    {
        CCriticalSection critical;
    
    public:
        bool m_bEnd;
        UINT    m_code_page;
        node_list<char*> m_nList;
        node<char*> *m_nPos;
        s_node<char*, language> m_sList;
    
    public:
        text_object ();
        text_object (char *m_pText);
        text_object (const text_object ©);
        text_object (node_list<char *> *copyFrom);
        text_object (bTreeType<char*> **source);
        ~text_object ();
    .
    ........etc ........

     It is a text object class, that encapsulates several types of data structures, which can be constructed from simple pointers to char, or from linked lists of pointers to char of two different types, depending...

    Read more »

  • It's ALIVE!

    glgorman07/17/2022 at 20:50 0 comments

    I've managed to get all of the functions from the compiler part, unit part, and declaration part to the point of compiling and linking; with still a lot of work remaining on the body part, which is still another 2000 lines, which mainly need WITH statements fixed up, as well as figuring out exactly how I am going to deal with the fact that PASCAL allows nested procedures, but C/C++, of course, does not.  Do I pass a pointer to any variables that I need to share from the nesting to the nested functions, - or is there a more elegant way that will work out better in the end; like if there is a sneaky way to use C++ nested classes (!) - which might work out really nicely if I could figure out how to use "placement new" to construct an instance of a virtual derived class which might somehow encapsulate an instance of an existing base object.  Now THAT would be nice.

    This method seems to work fairly straightforwardly.  Just derive DECLARATIONPART from PASCALCOMPILER and use a constructor that copies all of the member variables from the nesting class to the nested class, which is only about 2800 or so bytes, since a lot of stuff is in linked lists or tree structures, and we really only need to borrow a copy of the master pointers.

    DECLARATIONPART::DECLARATIONPART(COMPILERDATA *ptr)
    {
    	size_t sz = sizeof(COMPILERDATA);
    	COMPILERDATA *ptr2;
    	ptr2 = (COMPILERDATA*)this;
    	memcpy(ptr2,ptr,sz);
    }

    Now for something REALLY weird!  Why not just construct an object and then call a member function of that class by dereferencing the constructor, that is, without ever giving the object a name.  Apparently anonymous objects are allowed, but it would be nice if they looked a little prettier.

    if (!(options.INMODULE&&(SY==ENDSY)))
    {
    	CERROR(6);
    	SKIP(FSYS);
    	DECLARATIONPART(this).MAIN(FSYS);
    }

    Or maybe use placement new to construct the derived class on top of the existing object hierarchy, overwriting the stack in the process, but perhaps figuring out a way to make the stack "look" exactly what it would look like, that is if C/C++ allowed nested procedures.  There is also the "alloca" method of reserving space on the stack, then perhaps manipulating object trees over that - which brings me back to placement new.  Yet for now - the method of copying the whole base object seems to work well enough to get me into debugging the declaration part - and hopefully sooner than never, I will actually be generating some code files that can actually be run.

  • Project Status Update

    glgorman07/14/2022 at 12:21 0 comments

    I have posted pdf versions of the project description, details, and project logs in the files section of this project, along with a pdf version of the original source code for the UCSD p-system compiler for a more pleasant reading.  So much more fun if you are viewing with a tablet.  Additional source files are available on GitHub and in standard form and will be regularly updated as things continue to make progress.

    I read that CP/M has now been officially liberated.  So now it might be worthwhile to consider adding CP/M compatibility, instead of the original UCSD p-system file system.  Lots to do.

    In the meantime: Enjoy!

    I think I will completely rewrite the tokenizing function INSYMBOL even further, so as to, hopefully, completely eliminate all case statements, which of course get replaced with switch statements in C/C++.  Doesn't this look much nicer?

    namespace pascal0
    {
    key_info operators[] = 
    {
    	key_info(":=",BECOMES,NOOP),
    	key_info("(*",COMMENTSY,NOOP),
    	key_info("{",COMMENTSY,NOOP),
    	key_info("*)",SEPARATSY,NOOP),
    	key_info("}",SEPARATSY,NOOP),
    	key_info("<>",RELOP,NEOP),
    	key_info(">=",RELOP,GEOP),
    	key_info("<=",RELOP,LEOP),
    	key_info("..",COLON,NOOP),
    	key_info(".",PERIOD,NOOP),
    	key_info(":",COLON,NOOP),
    	key_info(";",SEMICOLON,NOOP),
    	key_info("^",ARROW,NOOP),
    	key_info("[",LBRACK,NOOP),
    	key_info("]",RBRACK,NOOP),
    	key_info("(",LPARENT,NOOP),
    	key_info(")",RPARENT,NOOP),
    	key_info(",",COMMA,NOOP),
    	key_info("+",ADDOP,PLUS),
    	key_info("-",ADDOP,MINUS),
    	key_info("*",MULOP,MUL),
    	key_info("/",MULOP,RDIV),
    	key_info("=",RELOP,EQOP),
    	key_info(">",RELOP,GTOP),	
    	key_info("<",RELOP,LTOP),
    	key_info("\'",STRINGCONST,NOOP),
    	key_info("",OTHERSY,NOOP),
    };
    };

    And thus another long chain of case statements bites the dust. 

  • Eliza Learns Pascal?

    glgorman07/10/2022 at 08:25 0 comments

    Well, sort of - this is going to be a LONG journey - but things are starting to move very quickly as of late.  Writing code is like that - months go by and NOTHING gets done - then in a couple of weekends I write a few thousand lines of code.  This should be fun after all.

    As if figuring out how to write a completely independent lexer, that works as good as, or better than the original wasn't enough work to do - then there is the notion of how to create ASTs (abstract syntax trees) that not only work with PASCAL, with C/C++, and yet also with standard English grammar, which might contain dialog, or it might contain commands like "KILL ALL TROLLS!", or "Build me a time machine".  Oh, what fun.

    int PASCALSOURCE::SYMBOL_DUMP (LPVOID)
    {
        size_t i;
        CREATE_SYMLIST(NULL);
        size_t sz = m_symbols.size();
        for (i=0;i<sz;i++)
        {
        DEBUG_SY(m_symbols[i],FORSY,DOSY);
        }
        WRITELN(OUTPUT);
        WRITELN(OUTPUT,(int)sz," decoded");
        return 0;
    }

     Yet isn't it nice to contemplate being able to search a project for every FOR statement or every IF-THEN, or to make a list of all of the procedures in the source, to be better able to make sure the conversion is going correctly?  Yet why not search "The Adventures of Tom Sawyer" for every reference to whitewash preceded by or followed by fence, or paragraphs that contain the name Injun Joe, and cave or caves in either same, the preceding or the following sentence, paragraph, or context?  Seems like a daunting task, but is it? Maybe, or maybe not.

    So, let's throw another log on the fire, and do it not with string manipulating functions like strcmp, strcpy, etc., but with abstract functions that can operate on, and transform text objects, whether they are in the form of pure ASCII strings, or tables, or linked lists, or vectors connection maps that link tree structures where the individual nodes of the subtrees point to linked lists or vectors of tokenized, and possibly compressed input which might in turn reference tables of dictionary pointers.

    Writing, or re-writing a compiler is quite a chore.  Having some interesting code analysis tools makes things a LOT more interesting.

     Now, back to killing trolls, and inventing time travel?

    Not, quite yet.  Let's suppose that we are analyzing real DNA, then one way of doing THAT involves lab techniques that involve things like restriction enzymes, centrifuges, HPLC, CRISPR, DNA chip technology, etc.  All so that we can later look at a genome, among other things, and have some way of doing something like "Find sequences that have CATTAGGTCTGA followed by ATCTACATCTAC or something like that, with whatever else might be in the middle.  Like if we had a partial analysis of some fragments of a real protein that we want to learn more about, and we need to find out where in some three billion base pairs that might be encoded, even if that is also in fragments, which might be subjected to later post-translation editing.

    Something like this looks VERY doable.

    DEBUG_GENE ( genome, "CATTAGGTCTGA" , "ATCTACATCTAC" );

     Just in case that sort of thing might be useful to someone.

    Suffice to mention, also, that if you have been programming long enough, then you know what it is like to sprinkle your code with 1000's of TRACE statements, or trying to pipe debugging information to a logfile with fprintf statements, and all of the hassle that goes into creating the format strings, setting up and cleaning up buffers for all of that, and so on.  When PASCAL does it so nicely - like this -- 

    WRITE (OUTPUT,' ',SYMBOL_NAMES2[p.SY]);
    WRITE (OUTPUT,'(',p.VAL.IVAL,')');

    Letting us use the PASCAL-style WRITE and WRITELN functions, which are perfectly happy to accept strings, characters, integers,...

    Read more »

  • The Road Much Less Travelled.

    glgorman07/08/2022 at 05:55 0 comments

    I cooked up an Eliza-based Pascal source tokenizer and tried using it to see how good it was (is) at doing some of the initial steps in converting the Pascal compiler to C++.  Although the initial results seem a bit cringe-worthy, they are not a complete disaster either.  So, I got really aggressive in creating a debugging environment for the Eliza-based tokenizer, as well as the original and these results together are looking quite promising.  First, a glimpse of the Eliza-based method.

    void PASCALCOMPILER::SOURCE_DUMP ()
    {
        ELIZA eliza;
        text_object source;
        char *buff1, *buf2;
        int line;
        line = 0;
        if (SYSCOMM::m_source==NULL)
        {
        WRITELN(OUTPUT,"NULL source file");
        return;
        }
        else if ((*SYSCOMM::m_source).size()==0)
        {
        WRITELN(OUTPUT,"Empty source file");
        return;
        }
        else do
        {
        buff1 = (*SYSCOMM::m_source)[line];
        source = buff1;
        buf2;
        eliza.process = source;
        eliza.pre_process (pascal2c);
        eliza.process >> buf2;
        WRITE(OUTPUT,buf2);
        delete buf2;
        line++;
        }
        while (buff1!=NULL);
    }

    The mostly complete source for this mess can be found of course in the GitHub repositories for this project and will be updated regularly.  Be very afraid.  Use at your own risk.  Guaranteed to contain LOTS of bugs.  On the other hand - creating a bunch of debugging code that inspects each symbol as it is parsed, and which selects for things like whatever is found starting with every occurrence of the keyword PROCEDURE and continuing until the first SEMICOLON encountered thereafter - yields a very promising result - which looks (in part) like this.

    12762: PROCEDURE
    12763:  "ASSIGN"
    12764: (
    12765:  "EXTPROC"
    12766: :
    12767:  "NONRESIDENT"
    12768: )
    12769: ;
    
    12859: PROCEDURE
    12860:  "GENJMP"
    12861: (
    12862:  "FOP"
    12863: :
    12864:  "OPRANGE"
    12865: ;
    
    13012: PROCEDURE
    13013:  "LOAD"
    13014: ;
    
    13017: PROCEDURE
    13018:  "GENFJP"
    13019: (
    13020:  "FLBP"
    13021: :
    13022:  "LBP"
    13023: )
    13024: ;
    
    13048: PROCEDURE
    13049:  "GENLABEL"
    13050: (
    13051: VAR
    13052:  "FLBP"
    13053: :
    13054:  "LBP"
    13055: )
    13056: ;
    
    13078: PROCEDURE
    13079:  "PUTLABEL"
    13080: (
    13081:  "FLBP"
    13082: :
    13083:  "LBP"
    13084: )
    13085: ;
    
    13175: PROCEDURE
    13176:  "LOAD"
    13177: ;
    
    13469: PROCEDURE
    13470:  "STORE"
    13471: (
    13472: VAR
    13473:  "FATTR"
    13474: :
    13475:  "ATTR"
    13476: )
    13477: ;
    
    13560: PROCEDURE
    13561:  "LOADADDRESS"
    13562: ;

    Now without taking another digression into a discussion of the meaning of the word SELECT, and what might mean in the context of relational databases, it should be easy to see how if all we were to do is to tokenize the input and then select sub-sections according to certain properties, then obviously - this leads to something that looks like it might be handled quite easily by some kind of #define TYPEGLOB_REORDER (A, B, C, ...) macro.  Even if I am not proceeding at this point with trying to do a pure preprocessor macro-based language scheme.  Somewhere, over the rainbow, maybe someday?

View all 14 project logs

Enjoy this project?

Share

Discussions

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates