MATLAB Examples


Growing a Compiler, Chapter 3

format compact  % Help MATLAB save screen space

3 GEM Versions 2 and 3

Treatment of whitespace and character classes.

3.1 Built-in scan

It is not too hard to make the second extension of iog, that automatically skips whitespace during interpretation. The trick is to skip blanks and newlines every place they can show up. There are just a few such places in each of the PARSE, SEARCH and BACK modes.

The new extension has its former capabilities and in addition no need for removing whitespace ahead of time. The new version can be seen to work by using the pretty version of nowhite as grammar input.

G   = gem2();   % Instantiate the object (built-in de-whiting).
GEM =;
fprintf('''%s''', GEM('x Y z', G.nowhite))

If you are interested in the new files iog2.h, iog2.c and gem2.m, you can download this File Exchange submission and look at them in the MATLAB editor.

3.2 Built-in character classes

The grammar executing machine gem0 treats all letters the same: letters name rules to be found in the IOG. In gem3 the eight character class letters are each processed separately. The call to has a flag which can contain any or all of TdluaDLUA. If, for instance, a is in the flag string, gem3 will act as though class asciiCFG is appended to the second argument. The consequence is that the actual IOGs are much shorter and the time to execute is much faster. For example, consider an IOG that recognizes strings of digits:

G   = gem3();   % Instantiate the object (built-in char classes)
GEM =;
fprintf('res=''%s''\n', GEM('182746', 'g=dg;g=d;', 'd'));

3.3 Size comparisons

Size comparisons for the versions so far

!wc -l iog0.c  iog0.h
    136    iog0.c 
     76    iog0.h 
    212    total 
!wc -l iog1.c  iog1.h
    160    iog1.c 
    108    iog1.h 
    268    total 
!wc -l iog2.c  iog2.h
    163    iog2.c 
    118    iog2.h 
    281    total 
!wc -l iog3.c  iog3.h
    230    iog3.c 
    204    iog3.h 
    434    total 

If you are interested in the new files iog3.h, iog3.c and gem3.m, you can download this File Exchange submission and look at them in the MATLAB editor.

3.4 Multi-character input and output symbols

The limitation of character input and output symbols to single characters was mentioned as a restriction that would be removed by bootstrapping. This is accomplished by an IOG nomulti. A special case is needed in nomulti for the characters ' and "; neither can be part of a multi-character symbol they delimit. Otherwise, multicharacter symbols are broken into a sequence of single-character symbols. nomulti applied to two trivial grammars

fprintf('123=%s\n', GEM('', GEM('r="123";', G.nomulti, 'A')));
fprintf('null=%s\n', GEM('123', GEM('r=''123'';', G.nomulti, 'A')));

3.5 A new pretty

Having allowed whitespace in IOGs, pretty printing also has to accept IOGs with whitespace but is still obligated to print just enough whitespace so as to make the IOG readable. Here the new pretty is used to display the IOG for dealing with multiple character input and output symbols. The trick is in the recursively defined names i and o.

nm = GEM(G.nomulti, G.pretty, 'ULA');  % using the new pretty
fprintf('--nomulti IOG--\n%s', nm);
--nomulti IOG--
g = P;
p = I I I;
p = O O O;
p = ''' i;
p = '"' o;
p = A;
i = ''';
i = "'" A "'" i;
o = '"';
o = """ A """ o;
I = ''' "'";
O = '"' """;
P = p P;
P =;

3.6 left-associative expressions

Right-associative expressions were defined as part of demonstrating the inverter. Using a technique similar to that found using in functional languages, left-associate (normal) expressions can be described as well.

postfix = GEM(G.postfix, G.pretty, 'ULA');
fprintf('postfix IOG:\n%s', postfix);
postfix IOG:
g = e;
e = t R;
r = '+' t "+";
r = '-' t "-";
t = f S;
s = '*' f "*";
s = '/' f "/";
f = L;
f = D;
f = '(' e ')';
R = r R;
R =;
S = s S;
S =;

The following code makes postfix from an expression

fprintf('postfix expr = %s', GEM('1+c*d+3', G.postfix, 'LD'));
postfix expr = 1cd*+3+

Just to complete the example, the following code makes prefix from an expression

fprintf('prefix expr = %s', GEM('1+c*d+3', G.prefix, 'LD'));
prefix expr = +1+*cd3

3.7 X86 float assembly language

The Intel X86 floating point unit (FPS) is a stack, thus is accessed with stack instructions such as fld (push onto FPS), fadd (add top elements on the stack). The postfix in the previous example only needs to be reformatted to give an assembly-code like equivalent.

x86expr = GEM(G.x86, G.pretty, 'ULA');
fprintf('x86 IOG:\n%s', x86expr);
x86 IOG:
g = e;
e = t R;
r = '+' t "fadd
r = '-' t "fsub
t = f S;
s = '*' f "fmul
s = '/' f "fdiv
f = "fld " L "
f = "fld =" D "
f = '(' e ')';
R = r R;
R =;
S = s S;
S =;

As you can see by looking at the IOG above, it was convenient to use the multi-character form of the IOG. nomulti has to be used on x86 to make it executable by gem3(). Applied to the example above, the result is:

asm1 = GEM(G.x86, G.nomulti, 'A');
res1 = GEM('1+c*d+3', asm1, 'LD');
fprintf('x86 assembly code\n%s', res1);
x86 assembly code
fld =1
fld c
fld d
fld =3

The story continues with the next chapter.