That is sort of funny. What you have described is nearly the same
as what I have implemented for the operator parser in CGI::Ex::Template
(CGI::Ex::Template is part of the CGI::Ex suite and implements a full
fledged TT2 engine and much of the planned TT3 spec).
To summarize:
I parse chains of operators and variables into an array - so
a + b.c - 1.3 * 2 + 3
would show up as:
@tree = qw(a + b.c - 1.3 * 2 + 3);
As I am parsing I also create a hash of found operators and I note their
precedence which I get from a table (it is possible to add any amount
of other operators to the table). This hash for this example would look like
%found = ('+' => 85, '-' => 85, '*' => 90)
I then call a method called "apply_precedence" passing it the tree and the %found
hash. Apply precedence takes the highest precedence operator and splits the
@tree array into sub trees whenever it finds an operator. Each of those
sub trees recursively calls "apply_precedence" until each sub tree only has one
element. The returned elements are placed in an execution optree that looks like
'+', 'a', 'b.c'. The previous example would parse down to something like (but with
a little bit different syntax for encoding the parsed variables, operators and arguments
to expressions:
['*', ['+', 'a', ['-', 'b.c', 1.3]], ['+', 2, 3]]
As a side note, a template language that I wrote for my current company starts with
this syntax so the parser doesn't have to worry about precedence.
I would like to know if there is a formal name for this type of parsing. The rest of the
parser is recursive in nature, but just sort of flattens out when it reaches an operator.
The grammar for this parser is hard coded - but is essentially contained with in the
parse_variable, parse_args, and apply_precedence methods (parse_variable should more aptly
be called parse_expression as it handles variables, numbers, var or num + var or num, parens,
{} and () constructs and string literals. I am always interested in seeing somebody write
something faster so if somebody would like to hack away - they are certainly welcome to.
Apply precedence also has logic to allow for it to correctly split up ternary and nested
ternary operators. Writing this also has helped me see that there may be ways to optimize things
even more.
The perldoc of CGI::Ex::Template includes a section on variables and how they are stored. To quote a large
section:
=head1 VARIABLE PARSE TREE
CGI::Ex::Template parses templates into an tree of operations. Even
variable access is parsed into a tree. This is done in a manner
somewhat similar to the way that TT operates except that nested
variables such as foo.bar|baz contain the '.' or '|' in between each
name level. Operators are parsed and stored as part of the variable (
+it
may be more appropriate to say we are parsing a term or an expression)
+.
The following table shows a variable or expression and the correspondi
+ng parsed tree
(this is what the parse_variable method would return).
one [ 'one', 0 ]
one() [ 'one', [] ]
one.two [ 'one', 0, '.', 'two', 0 ]
one|two [ 'one', 0, '|', 'two', 0 ]
one.$two [ 'one', 0, '.', ['two', 0 ], 0 ]
one(two) [ 'one', [ ['two', 0] ] ]
one.${two().three} [ 'one', 0, '.', ['two', [], '.', 'three', 0],
+ 0]
2.34 2.34
"one" "one"
"one"|length [ \"one", 0, '|', 'length', 0 ]
"one $a two" [ \ [ '~', 'one ', ['a', 0], ' two' ], 0 ]
[0, 1, 2] [ \ [ 'array', 0, 1, 2 ], 0 ]
[0, 1, 2].size [ \ [ 'array', 0, 1, 2 ], 0, '.', 'size', 0 ]
['a', a, $a ] [ \ [ 'array', 'a', ['a', 0], [['a', 0], 0] ],
+0]
{a => 'b'} [ \ [ 'hash', 'a', 'b' ], 0 ]
{a => 'b'}.size [ \ [ 'hash', 'a', 'b' ], 0, '.', 'size', 0 ]
{$a => b} [ \ [ 'hash', ['a', 0], ['b', 0] ], 0 ]
1 + 2 [ \ [ '+', 1, 2 ], 0]
a + b [ \ [ '+', ['a', 0], ['b', 0] ], 0 ]
a * (b + c) [ \ [ '*', ['a', 0], [ \ ['+', ['b', 0], ['c',
+0]], 0 ]], 0 ]
(a + b) [ \ [ '+', ['a', 0], ['b', 0] ]], 0 ]
(a + b) * c [ \ [ '*', [ \ [ '+', ['a', 0], ['b', 0] ], 0 ]
+, ['c', 0] ], 0 ]
a ? b : c [ \ [ '?', ['a', 0], ['b', 0], ['c', 0] ], 0 ]
a || b || c [ \ [ '||', ['a', 0], [ \ [ '||', ['b', 0], ['c
+', 0] ], 0 ] ], 0 ]
! a [ \ [ '!', ['a', 0] ], 0 ]
Some notes on the parsing.
Operators are parsed as part of the variable and become part of th
+e variable tree.
Operators are stored in the variable tree using a reference to the
+ arrayref - which
allows for quickly descending the parsed variable tree and determi
+ning that the next
node is an operator.
Parenthesis () can be used at any point in an expression to disamb
+iguate precedence.
"Variables" that appear to be literal strings or literal numbers
are returned as the literal (no operator tree).
The following perl can be typed at the command line to view the parsed
+ variable tree:
perl -e 'use CGI::Ex::Template; print CGI::Ex::Template::dump_pars
+e("foo.bar + 2")."\n"'
Also the following can be included in a template to view the output in
+ a template:
[% USE cet = CGI::Ex::Template %]
[%~ cet.dump_parse('foo.bar + 2').replace('\s+', ' ') %]
The following is the long - but useful operators tree (I recently cleaned up the operator
table to be a little more clear for version 2.02 which the head version on CPAN). The table
is used to generate the regexes necessary for parsing operators.
$OPERATORS = [
# type precedence symbols action (undef means pl
+ay_operator will handle)
['prefix', 98, ['++'], undef
+ ],
['prefix', 98, ['--'], undef
+ ],
['postfix', 98, ['++'], undef
+ ],
['postfix', 98, ['--'], undef
+ ],
['infix', 96, ['**', 'pow'], sub { $_[0] ** $_[
+1] } ],
['prefix', 93, ['!'], sub { ! $_[0]
+ } ],
['prefix', 93, ['-'], sub { @_ == 1 ? 0 - $_
+[0] : $_[0] - $_[1] } ],
['infix', 90, ['*'], sub { $_[0] * $_[
+1] } ],
['infix', 90, ['/'], sub { $_[0] / $_[
+1] } ],
['infix', 90, ['div', 'DIV'], sub { int($_[0] / $_[
+1]) } ],
['infix', 90, ['%', 'mod', 'MOD'], sub { $_[0] % $_[
+1] } ],
['infix', 85, ['+'], sub { $_[0] + $_[
+1] } ],
['infix', 85, ['-'], sub { @_ == 1 ? 0 - $_
+[0] : $_[0] - $_[1] } ],
['infix', 85, ['~', '_'], sub { join "", @_
+ } ],
['infix', 80, ['<'], sub { $_[0] < $_[
+1] } ],
['infix', 80, ['>'], sub { $_[0] > $_[
+1] } ],
['infix', 80, ['<='], sub { $_[0] <= $_[
+1] } ],
['infix', 80, ['>='], sub { $_[0] >= $_[
+1] } ],
['infix', 80, ['lt'], sub { $_[0] lt $_[
+1] } ],
['infix', 80, ['gt'], sub { $_[0] gt $_[
+1] } ],
['infix', 80, ['le'], sub { $_[0] le $_[
+1] } ],
['infix', 80, ['ge'], sub { $_[0] ge $_[
+1] } ],
['infix', 75, ['==', 'eq'], sub { $_[0] eq $_[
+1] } ],
['infix', 75, ['!=', 'ne'], sub { $_[0] ne $_[
+1] } ],
['infix', 70, ['&&'], undef
+ ],
['infix', 65, ['||'], undef
+ ],
['infix', 60, ['..'], sub { $_[0] .. $_[
+1] } ],
['ternary', 55, ['?', ':'], undef
+ ],
['assign', 53, ['+='], sub { $_[0] + $_[
+1] } ],
['assign', 53, ['-='], sub { $_[0] - $_[
+1] } ],
['assign', 53, ['*='], sub { $_[0] * $_[
+1] } ],
['assign', 53, ['/='], sub { $_[0] / $_[
+1] } ],
['assign', 53, ['%='], sub { $_[0] % $_[
+1] } ],
['assign', 53, ['**='], sub { $_[0]** $_[
+1] } ],
['assign', 53, ['~=', '_='], sub { $_[0] . $_[
+1] } ],
['assign', 52, ['='], undef
+ ],
['prefix', 50, ['not', 'NOT'], sub { ! $_[0]
+ } ],
['infix', 45, ['and', 'AND'], undef
+ ],
['infix', 40, ['or', 'OR'], undef
+ ],
];
Again - it is sort of funny to see the same ideas discovered and rediscovered over and over.
my @a=qw(random brilliant braindead); print $a[rand(@a)];
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.