Close

Fixing Unary Minus

A project log for Simple Compiler

A very simple compiler for minimalist home brew CPUs

agpcooperagp.cooper 07/08/2017 at 00:481 Comment

Language Structure

Programming languages can the presented/described in format called Backus-Naur Form (BNF). Here is a proper example from "Let's Build a Compiler!" Jack W. Crenshaw:

<b-expression> ::= <b-term> [<orop> <b-term>]*
<b-term> ::= <not-factor> [AND <not-factor>]*
<not-factor> ::= [NOT] <b-factor>
<b-factor> ::= <b-literal> | <b-variable> | <relation>
<relation> ::= | <expression> [<relop> <expression]
<expression> ::= <term> [<addop> <term>]*
<term> ::= <signed factor> [<mulop> factor]*
<signed factor>::= [<addop>] <factor>
<factor> ::= <integer> | <variable> | (<b-expression>)
Here is what I decoded from Simple Compiler in a "bnf" like form:
Factor         -> Identifier
               -> Integer
               -> ( BoolExpression )
Term           -> Factor
               -> Factor * Factor+
               -> Factor / Factor+
Expression     -> Term
               -> + Term
               -> - Term
               -> Term + Term+
               -> Term - Term+
BoolExpression -> Expressiom
               -> Expression = Expression+
               -> Expression <> Expression+
               -> Expression < Expression+
               -> Expression <= Expression+
               -> Expression > Expression+
               -> Expression >= Expression+

(Note, my decoded format is as "coded" and is upside down compared to the bnf format.)

Okay what does all this mean? Basically the minimum number of precedence levels (from "Let's Build a Compiler!" Jack W. Crenshaw) is 8:

LevelSyntax ElementOperator
0factorliteral, variable
1signed factorunary minus
2term*, /
3expression+, -
4b-factorliteral, variable, relop
5not-factorNOT
6b-termAND
7b-expressionOR, XOR

The Simple compiler only has 4, so it was doomed from the start. Even without the bit operations it still needed 5 levels. What failed was the unary minus. This is why I had to put brackets around the "-5" the the code:

begin
  A=4;
  B=1;
  if A<B then
    write A*10<b;
  else begin
    write 1<B*10;
    write B*10, (a+b)*(-5);
  end
  C=-1;
  write C;
end

The original coder did try with "Expression -> - Term" as the C=-1 does work, but a true unary minus needs a higher precedence than "*" for "(a+b)*-5" to work.

So I need at added "SignedFactor" between "Factor" and "Term" based on Crenshaw's bnf, and remove the "Expression -> + Term" and "Expression -> - Term":

Factor         -> Identifier
               -> Integer
               -> ( BoolExpression )
SignedFactor   -> Factor
               -> + Factor
               -> - Factor           
Term           -> SignedFactor
               -> SignedFactor * SignedFactor+
               -> SignedFactor / SignedFactor+
Expression     -> Term
               -> Term + Term+
               -> Term - Term+
BoolExpression -> Expressiom
               -> Expression = Expression+
               -> Expression <> Expression+
               -> Expression < Expression+
               -> Expression <= Expression+
               -> Expression > Expression+
               -> Expression >= Expression+

It took a little while to get my head around the internals of the recursive procedure calls but eventually I got it to work:

void SignedFactor(void)
{
  char op;
  
  op=tTok;
  if ((op==_plus)||(op==_minus)) {
    GetNextToken();
  }
  Factor();
  if (op==_plus) {
    // Nothing to do
  } else if (op==_minus) {
    Emit(_movBxAx);
    Emit(_xorAxAx);
    Emit(_subBx);
  }
}
In the above code:

Here is the current language structure for Simple Compiler:

Factor         -> Identifier
               -> Integer
               -> ( BoolExpression )
SignedFactor   -> Factor
               -> + Factor
               -> - Factor
Term           -> SignedFactor
               -> SignedFactor * SignedFactor+
               -> SignedFactor / SignedFactor+
               -> SignedFactor % SignedFactor+
Expression     -> Term
               -> Term + Term+
               -> Term - Term+
BoolExpression -> Expression
               -> Expression = Expression+
               -> Expression <> Expression+
               -> Expression < Expression+
               -> Expression <= Expression+
               -> Expression > Expression+
               -> Expression >= Expression+
Statement      -> Begin
               -> While
               -> If
               -> Write
               -> Read
               -> Assignment
Begin          -> Statement
               -> End
While          -> BoolExpression Statement
If             -> BoolExpression then Statement
               -> BoolExpression then Statement "else" Statement
Write          -> BoolExpression , BoolExpression+
Read           -> Input
Assignment     -> Identifier = BoolExpression
The only thing not shown above is that the semicolon (i.e. ";") is treated as a white space.

Discussions

Ditrianum wrote 03/15/2020 at 09:53 point

Crenshaw's (expression) installments are not so straight forward. In chapter 6 initial implementations have to be adjusted and renamed several times and it is not always clear what is meant by "rename the previous one to...". Then, in chapter 10 things are reorganized and renamed again. Only by comparing the chapters and copy-pasting the final implementations one can derive a correctly working expression library, which can be used as a guide: https://sharpbasic.com/pascal/exp.pas

  Are you sure? yes | no