It seems you're trying to parse Graphviz files. Do you know of Graphviz CPAN modules?
| [reply] |
A decent starting point might be looking at Config::General. Its a module which parses Apache conf style config files, and I think could get you off on the right start.
Example file and how Config::General parses it.
# File blah.conf
<graph>
<node>
title = node1
<loc>
x = 10
y = 20
</loc>
</node>
<node>
title = node2
<loc>
x = 10
y - 29
</loc>
</node>
<edge>
sourcename = node1
targetname = node2
</edge>
</graph>
# After parsing file and placing into %conf you will have
%conf = (
node = {
title = {
node1 = {
loc = {
x => 10,
y => 20,
},
},
node2 = {
loc = {
x => 10,
y => 20,
},
},
},
},
edge = {
sourcename => node1,
targetname => node2,
},
)
# You can then reference pieces thereof via
$conf->{node}->{title}->{node1}->{loc}->{x};
I might have messed up the indentation, as to how it builds the hashes, but I think the point is clear. Also if you do something like x = 10 20 30 40, when you reference conf->{x} it will return an array ref, the contents of which is 10, 20, 30, 40.. you can alter the delimiter, which is whitespace by default to something else, so on and so forth with all the perl yummy goodness.
So my suggestion would be to grab the module, look over its parser and building routines, and then abstract it out as you see fit. It should be a reasonable exercise to port the hash of hash structure, to say a C struct.
Best of luck and happy hacking
/* And the Creator, against his better judgement, wrote man.c */ | [reply] [d/l] |
Here's a basic parser for your example graph
use strict;
use Data::Dumper;
use Regexp::Common;
my $name = qr/[a-z0-9_]+/i;
my $val = qr/\S+/;
my $braces = qr/$RE{balanced}{-parens=>'{}'}/;
my $node = qr/($name) \s+ ($braces)/x;
my $assign = qr/($name) [=:] \s* ($val) (?:\s+ | })/x;
my $graph = <<TXT;
graph {
node { title: "node1" loc {x: 10 y: 20} }
node {
title: "node2"
loc {
x: 10 y: 20
}
}
edge
{
sourcename="node1"
targetname="node2"
}
}
TXT
print Dumper( parse_graph($graph) );
sub parse_graph {
my $graph = shift;
my $tree = {};
pos($graph) = 0;
PARSE: {
if($graph =~ /\G {? \s* $assign/sgcx) {
$tree->{$1} = $2;
redo PARSE;
} elsif($graph =~ /\G .*? $node/sgcx) {
push @{ $tree->{$1} }, parse_graph($2);
redo PARSE;
}
}
return $tree;
}
__output__
$VAR1 = {
'graph' => [
{
'edge' => [
{
'sourcename' => '"node1"',
'targetname' => '"node2"'
}
],
'node' => [
{
'title' => '"node1"',
'loc' => [
{
'x' => '10',
'y' => '20'
}
]
},
{
'title' => '"node2"',
'loc' => [
{
'x' => '10',
'y' => '20'
}
]
}
]
}
]
};
Will probably need tweaking for your own preferences but hopefully it's a start :)
HTH
_________ broquaint | [reply] [d/l] |
If you don't mind slurping the whole file into memory you can
do it with regexps. First divide the file into part containing nodes and another one containing edges. Than do a match similar to this:
while($nodepart =~ m/node\s* \{\s* title:\s* "([^"]*)"\s*
loc\s* \{\s* x:\s* (\d+)\s* y:\s* (\d+)\s* \}\s*
\}\s*/gx){
build_the_structure_with_captured_node($1,$2,$3);
}
And another one for the edges.
Most of languages have some regexp library - so this can work.
Update: The s modifier was not needed - \s matches a newline without it. | [reply] [d/l] |