comment on

That is sort of funny. What you have described is nearly the same as what I have implemented for the operator parser in CGI::Ex::Template (CGI::Ex::Template is part of the CGI::Ex suite and implements a full fledged TT2 engine and much of the planned TT3 spec).

To summarize:

I parse chains of operators and variables into an array - so

  a + b.c - 1.3 * 2 + 3
[download]

would show up as:

  @tree = qw(a + b.c - 1.3 * 2 + 3);
[download]

As I am parsing I also create a hash of found operators and I note their precedence which I get from a table (it is possible to add any amount of other operators to the table). This hash for this example would look like

  %found = ('+' => 85, '-' => 85, '*' => 90)
[download]

I then call a method called "apply_precedence" passing it the tree and the %found hash. Apply precedence takes the highest precedence operator and splits the @tree array into sub trees whenever it finds an operator. Each of those sub trees recursively calls "apply_precedence" until each sub tree only has one element. The returned elements are placed in an execution optree that looks like '+', 'a', 'b.c'. The previous example would parse down to something like (but with a little bit different syntax for encoding the parsed variables, operators and arguments to expressions:

  ['*', ['+', 'a', ['-', 'b.c', 1.3]], ['+', 2, 3]]
[download]

As a side note, a template language that I wrote for my current company starts with this syntax so the parser doesn't have to worry about precedence.

I would like to know if there is a formal name for this type of parsing. The rest of the parser is recursive in nature, but just sort of flattens out when it reaches an operator.

The grammar for this parser is hard coded - but is essentially contained with in the parse_variable, parse_args, and apply_precedence methods (parse_variable should more aptly be called parse_expression as it handles variables, numbers, var or num + var or num, parens, {} and () constructs and string literals. I am always interested in seeing somebody write something faster so if somebody would like to hack away - they are certainly welcome to. Apply precedence also has logic to allow for it to correctly split up ternary and nested ternary operators. Writing this also has helped me see that there may be ways to optimize things even more.

The perldoc of CGI::Ex::Template includes a section on variables and how they are stored. To quote a large section:

=head1 VARIABLE PARSE TREE

CGI::Ex::Template parses templates into an tree of operations.  Even
variable access is parsed into a tree.  This is done in a manner
somewhat similar to the way that TT operates except that nested
variables such as foo.bar|baz contain the '.' or '|' in between each
name level.  Operators are parsed and stored as part of the variable (
+it
may be more appropriate to say we are parsing a term or an expression)
+.

The following table shows a variable or expression and the correspondi
+ng parsed tree
(this is what the parse_variable method would return).

    one                [ 'one',  0 ]
    one()              [ 'one',  [] ]
    one.two            [ 'one',  0, '.', 'two',  0 ]
    one|two            [ 'one',  0, '|', 'two',  0 ]
    one.$two           [ 'one',  0, '.', ['two', 0 ], 0 ]
    one(two)           [ 'one',  [ ['two', 0] ] ]
    one.${two().three} [ 'one',  0, '.', ['two', [], '.', 'three', 0],
+ 0]
    2.34               2.34
    "one"              "one"
    "one"|length       [ \"one", 0, '|', 'length', 0 ]
    "one $a two"       [ \ [ '~', 'one ', ['a', 0], ' two' ], 0 ]
    [0, 1, 2]          [ \ [ 'array', 0, 1, 2 ], 0 ]
    [0, 1, 2].size     [ \ [ 'array', 0, 1, 2 ], 0, '.', 'size', 0 ]
    ['a', a, $a ]      [ \ [ 'array', 'a', ['a', 0], [['a', 0], 0] ], 
+0]
    {a  => 'b'}        [ \ [ 'hash',  'a', 'b' ], 0 ]
    {a  => 'b'}.size   [ \ [ 'hash',  'a', 'b' ], 0, '.', 'size', 0 ]
    {$a => b}          [ \ [ 'hash',  ['a', 0], ['b', 0] ], 0 ]
    1 + 2              [ \ [ '+', 1, 2 ], 0]
    a + b              [ \ [ '+', ['a', 0], ['b', 0] ], 0 ]
    a * (b + c)        [ \ [ '*', ['a', 0], [ \ ['+', ['b', 0], ['c', 
+0]], 0 ]], 0 ]
    (a + b)            [ \ [ '+', ['a', 0], ['b', 0] ]], 0 ]
    (a + b) * c        [ \ [ '*', [ \ [ '+', ['a', 0], ['b', 0] ], 0 ]
+, ['c', 0] ], 0 ]
    a ? b : c          [ \ [ '?', ['a', 0], ['b', 0], ['c', 0] ], 0 ]
    a || b || c        [ \ [ '||', ['a', 0], [ \ [ '||', ['b', 0], ['c
+', 0] ], 0 ] ], 0 ]
    ! a                [ \ [ '!', ['a', 0] ], 0 ]

Some notes on the parsing.

    Operators are parsed as part of the variable and become part of th
+e variable tree.

    Operators are stored in the variable tree using a reference to the
+ arrayref - which
    allows for quickly descending the parsed variable tree and determi
+ning that the next
    node is an operator.

    Parenthesis () can be used at any point in an expression to disamb
+iguate precedence.

    "Variables" that appear to be literal strings or literal numbers
    are returned as the literal (no operator tree).

The following perl can be typed at the command line to view the parsed
+ variable tree:

    perl -e 'use CGI::Ex::Template; print CGI::Ex::Template::dump_pars
+e("foo.bar + 2")."\n"'

Also the following can be included in a template to view the output in
+ a template:

    [% USE cet = CGI::Ex::Template %]
    [%~ cet.dump_parse('foo.bar + 2').replace('\s+', ' ') %]
[download]

The following is the long - but useful operators tree (I recently cleaned up the operator table to be a little more clear for version 2.02 which the head version on CPAN). The table is used to generate the regexes necessary for parsing operators.

$OPERATORS = [
    # type      precedence symbols              action (undef means pl
+ay_operator will handle)
    ['prefix',  98,        ['++'],              undef                 
+                      ],
    ['prefix',  98,        ['--'],              undef                 
+                      ],
    ['postfix', 98,        ['++'],              undef                 
+                      ],
    ['postfix', 98,        ['--'],              undef                 
+                      ],
    ['infix',   96,        ['**', 'pow'],       sub {     $_[0] ** $_[
+1]                  } ],
    ['prefix',  93,        ['!'],               sub {   ! $_[0]       
+                    } ],
    ['prefix',  93,        ['-'],               sub { @_ == 1 ? 0 - $_
+[0] : $_[0] - $_[1] } ],
    ['infix',   90,        ['*'],               sub {     $_[0] *  $_[
+1]                  } ],
    ['infix',   90,        ['/'],               sub {     $_[0] /  $_[
+1]                  } ],
    ['infix',   90,        ['div', 'DIV'],      sub { int($_[0] /  $_[
+1])                 } ],
    ['infix',   90,        ['%', 'mod', 'MOD'], sub {     $_[0] %  $_[
+1]                  } ],
    ['infix',   85,        ['+'],               sub {     $_[0] +  $_[
+1]                  } ],
    ['infix',   85,        ['-'],               sub { @_ == 1 ? 0 - $_
+[0] : $_[0] - $_[1] } ],
    ['infix',   85,        ['~', '_'],          sub { join "", @_     
+                    } ],
    ['infix',   80,        ['<'],               sub {     $_[0] <  $_[
+1]                  } ],
    ['infix',   80,        ['>'],               sub {     $_[0] >  $_[
+1]                  } ],
    ['infix',   80,        ['<='],              sub {     $_[0] <= $_[
+1]                  } ],
    ['infix',   80,        ['>='],              sub {     $_[0] >= $_[
+1]                  } ],
    ['infix',   80,        ['lt'],              sub {     $_[0] lt $_[
+1]                  } ],
    ['infix',   80,        ['gt'],              sub {     $_[0] gt $_[
+1]                  } ],
    ['infix',   80,        ['le'],              sub {     $_[0] le $_[
+1]                  } ],
    ['infix',   80,        ['ge'],              sub {     $_[0] ge $_[
+1]                  } ],
    ['infix',   75,        ['==', 'eq'],        sub {     $_[0] eq $_[
+1]                  } ],
    ['infix',   75,        ['!=', 'ne'],        sub {     $_[0] ne $_[
+1]                  } ],
    ['infix',   70,        ['&&'],              undef                 
+                      ],
    ['infix',   65,        ['||'],              undef                 
+                      ],
    ['infix',   60,        ['..'],              sub {     $_[0] .. $_[
+1]                  } ],
    ['ternary', 55,        ['?', ':'],          undef                 
+                      ],
    ['assign',  53,        ['+='],              sub {     $_[0] +  $_[
+1]                  } ],
    ['assign',  53,        ['-='],              sub {     $_[0] -  $_[
+1]                  } ],
    ['assign',  53,        ['*='],              sub {     $_[0] *  $_[
+1]                  } ],
    ['assign',  53,        ['/='],              sub {     $_[0] /  $_[
+1]                  } ],
    ['assign',  53,        ['%='],              sub {     $_[0] %  $_[
+1]                  } ],
    ['assign',  53,        ['**='],             sub {     $_[0]**  $_[
+1]                  } ],
    ['assign',  53,        ['~=', '_='],        sub {     $_[0] .  $_[
+1]                  } ],
    ['assign',  52,        ['='],               undef                 
+                      ],
    ['prefix',  50,        ['not', 'NOT'],      sub {   ! $_[0]       
+                    } ],
    ['infix',   45,        ['and', 'AND'],      undef                 
+                      ],
    ['infix',   40,        ['or', 'OR'],        undef                 
+                      ],
];
[download]

Again - it is sort of funny to see the same ideas discovered and rediscovered over and over.

my @a=qw(random brilliant braindead); print $a[rand(@a)];

In reply to Re: Operator Precedence Parser by Rhandom
in thread Operator Precedence Parser by bart

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Welcome to the Monastery
	PerlMonks