betacentauri has asked for the wisdom of the Perl Monks concerning the following question:
O Monks,
I need to parse some text expressions which are a lot like the arithmetic expressions consisting of +, * and constants, so being able to parse arithmetic expressions would be more than enough. What I need is a syntax tree, where every leaf node is a constant, and the interior nodes are "+" and "*" tokens.
Using Tree::Parser seemed like a good deal to me because I need to do some tree traversal after parsing, and the Tree::Simple object returned by Tree::Parser seems to provide suitable methods.
However, the useNestedParensFilter() filter coming with the module is not quite what I need. I'm lost at how to write a proper parser to be given to setParseFilter(), as the iterator->next() function seems to break lexical units only on parentheses or whitespace, while I need to detect "+" and "*" tokens alone as well.
Can someone explain how to build a syntax tree for {+,*,A..Z,(,)} with Tree::Parser?
TIA
Re: Parsing syntax trees
by GrandFather (Saint) on Dec 30, 2015 at 07:16 UTC
|
#!/usr/bin/perl
use warnings;
use strict;
use Marpa::R2;
package Element;
sub newNode {
my ($value, $left, $right) = @_;
return bless {left => $left, right => $right, value => $value}, 'E
+lement';
}
sub doOp {
my @params = @_;
return newNode(@params[2, 1, 3],);
}
sub doConst {
my @params = @_;
return newNode($params[1]);
}
sub dump {
my ($node, $indent) = @_;
$indent //= '';
$node->{left}->dump("$indent ") if $node->{left};
print "$indent$node->{value}\n";
$node->{right}->dump("$indent ") if $node->{right};
}
package main;
my $syntax = <<'SYNTAX';
lexeme default = latm => 1
expression ::= expression '+' term action => doOp | term action => ::f
+irst
term ::= term '*' factor action => doOp | factor action => ::first
factor ::= constant action => doConst
constant ~ digits
digits ~ [\d]+
:discard ~ spaces
spaces ~ [\s]+
SYNTAX
my $grammar = Marpa::R2::Scanless::G->new({source => \$syntax});
my $input = '1 * 2 + 3 * 4';
my $root = $grammar->parse(\$input, 'Element');
($$root)->dump();
Prints:
1
*
2
+
3
*
4
No idea if that helps or not, but it was interesting to have a play.
Premature optimization is the root of all job security
| [reply] [d/l] [select] |
Re: Parsing syntax trees
by GrandFather (Saint) on Dec 30, 2015 at 08:31 UTC
|
#!/usr/bin/perl
use warnings;
use strict;
use Marpa::R2;
use Tree::Binary;
package Construct;
sub newNode {
my ($value, $left, $right) = @_;
my $node = Tree::Binary->new($value);
$node->setLeft($left) if $left;
$node->setRight($right) if $right;
return $node;
}
sub doOp {
my @params = @_;
return newNode(@params[2, 1, 3],);
}
sub doConst {
my @params = @_;
return newNode($params[1]);
}
package main;
my $syntax = <<'SYNTAX';
lexeme default = latm => 1
expression ::= expression '+' term action => doOp | term action => ::f
+irst
term ::= term '*' factor action => doOp | factor action => ::first
factor ::= constant action => doConst
constant ~ digits
digits ~ [\d]+
:discard ~ spaces
spaces ~ [\s]+
SYNTAX
my $grammar = Marpa::R2::Scanless::G->new({source => \$syntax});
my $input = '1 * 2 + 3 * 4';
my $root = $grammar->parse(\$input, 'Construct');
($$root)->traverse(sub {print " " x $_[0]->{_depth}, $_[0]->getNodeVa
+lue(), "\n"});
Prints:
+
*
1
2
*
3
4
Premature optimization is the root of all job security
| [reply] [d/l] [select] |
Re: Parsing syntax trees
by GrandFather (Saint) on Dec 30, 2015 at 03:54 UTC
|
It doesn't look to me like Tree::Parser is the right tool, but then I have no real idea what it is you are trying to achieve. Try filling us in on the back story somewhat. At least show us the nature of the syntax you are trying to parse and give us a hint about how you intend to use the parse tree.
Premature optimization is the root of all job security
| [reply] |
|
Just WOW! Thank you very much for your responses! I had been away from Perl for a while, had not known about Marpa before.
If you insist, I'll tell something about this. I had not done this before 1) in the interest of keeping my question short and 2) by fear of being told that this is already done, and way better than I am capable of. Which is exactly what's going to ensue now. Trust me. So be it.
I am a lazy Computer Networks teacher. I want to have a minimalist language to describe a (computer) network topology and diagram layout. My idea is to cruft a file like
R:router
S1:switch
S2:switch
H1:host
H2:host
H3:host
R-S1
R-S2
H1-S1
H2-S1
H3-S2
R;(S1;(H1,H2)),(S2;H1)
This would describe a simple network where a router is linked to two switches, S1 linked to hosts H1 and H2, and then S2, linked to H3.
The first stances declare topology elements in association with some predefined icons. The middle part tells what's linked to what. The final line describes the diagram layout. In my mind "A;B" means A is graphically above B, and "A,B" means A is to the left of B. This final line is what I am trying to parse.
After having the syntax tree I want to apply some simple algorithms to get the bounding box dimensions for each subtree and finally draw the graph with properly centered icons and links.
I was not expecting binary trees as a result, I'll have to digest that. Maybe I'll be fine with them.
Thank you again!
| [reply] [d/l] |
|
Reinventing the wheel as a learning exercise is a fine thing. Reinventing a major wheel as part of getting some other primary task done is generally unproductive and a "bad thing"™. In this case I think "learning exercise" applies so go for it.
Providing a succinct version of the back story always helps guide any coding decisions that need to be made. Trying to "keep the question short" almost always ends up with a series of questions to elicit the "unimportant details" that were left out meaning much more work for everyone and a lot longer until you get the help you were looking for.
You could almost as easily use Tree::Simple instead to get an n-ary tree. So, taking something like your example and using Tree::Simple:
#!/usr/bin/perl
use warnings;
use strict;
use Marpa::R2;
use Tree::Simple;
package Construct;
sub doMap {
my @params = @_;
my $child = 'ARRAY' eq ref $params[3] ? $params[3][1] : $params[3]
+;
return $params[1]->addChild($child) if $child->getNodeValue() ne '
+root';
my @children = $child->getAllChildren();
return $params[1]->addChildren(@children);
}
sub doMapSib {
my @params = @_;
my ($lhs, $rhs) = map {'ARRAY' eq ref $_ ? $_->[1] : $_} @params[1
+, 3];
my $root;
if ($lhs->getNodeValue() eq 'root') {
$root = $lhs;
$lhs = undef;
} elsif ($rhs->getNodeValue() eq 'root') {
$root = $rhs;
$rhs = undef;
} else {
$root = Tree::Simple->new('root');
}
$root->addChild($lhs) if $lhs;
$root->addChild($rhs) if $rhs;
return $root;
}
my $tail = '';
sub doTail {
my @params = @_;
$tail = $params[1];
return;
}
sub doCode {
my @params = @_;
my $node = Tree::Simple->new("$params[1]$tail");
$tail = '';
return $node;
}
package main;
my $syntax = <<'SYNTAX';
lexeme default = latm => 1
network ::= ident ';' subnet action => doMap
| subnet action => ::first
subnet ::= sibling ',' subnet action => doMapSib
| sibling action => ::first
sibling ::= '(' network ')' action => [values]
| ident action => ::first
ident ::= letter tail action => doCode
| letter action => doCode
tail ::= digits action => doTail
letter ~ [A-Z]+
digits ~ [\d]+
:discard ~ spaces
spaces ~ [\s]+
SYNTAX
my $grammar = Marpa::R2::Scanless::G->new({source => \$syntax});
my $input = 'R;(S1;(H1,H2,H4,H5)),(S2;H3)';
my $tree = ${$grammar->parse(\$input, 'Construct')};
my $root = Tree::Simple->new('root')->addChild($tree);
my $depth = $root->getHeight();
($root)->traverse(
sub {
print " " x ($depth - $_[0]->getHeight()), $_[0]->getNodeValu
+e(), "\n";
}
);
Prints:
R
S1
H4
H5
H2
H1
S2
H3
Premature optimization is the root of all job security
| [reply] [d/l] [select] |
Re: Parsing syntax trees
by Anonymous Monk on Dec 30, 2015 at 04:00 UTC
|
| [reply] |
|
|