Graph File Parsing

arunhorne has asked for the wisdom of the Perl Monks concerning the following question:

I need to build a parser for a file format shown below. I have decided to hand code it (as against constructing a parser with Lex/Yacc) as it is comparatively simple, but would be interested in any observations people could make on the best way to approach parsing this file. The file represents a graph of vertices and edges:

graph {
  node { title: "node1" loc {x: 10 y: 20} }
  node {
    title: "node2"
    loc {
      x: 10 y: 20
    }
  }

 edge
 {
   sourcename="node1"
   targetname="node2"
  }
}
[download]

As your can see from the example the file is well structured and I want to create a list of node structures and a list of edge structures from this file - node has three properties - x, y and title whilst edge just has two properties, sourcename and targetname which correspond to node titles. I am not interested in checking the file (or the graph) for validity -- these files are generated by another program and therefore are always valid.

Can anyone help me with suitable code to load this data from the file (preferably without using any Perl specific features -- I am prototyping in Perl but may have to implement this parser eventually in another language due to constraints imposed by others) or any suggestions about how to do what I want in the most simple (but clear) way possible?

Any help will be greatly appreciated, thanks in advance,

____________
Arun

Comment on Graph File Parsing Download Code

Replies are listed 'Best First'.
Re: Graph File Parsing by larsen (Parson) on Mar 10, 2003 at 13:55 UTC
It seems you're trying to parse Graphviz files. Do you know of Graphviz CPAN modules?	[reply]
Re: Graph File Parsing by l2kashe (Deacon) on Mar 10, 2003 at 13:39 UTC
A decent starting point might be looking at Config::General. Its a module which parses Apache conf style config files, and I think could get you off on the right start. Example file and how Config::General parses it. # File blah.conf <graph> <node> title = node1 <loc> x = 10 y = 20 </loc> </node> <node> title = node2 <loc> x = 10 y - 29 </loc> </node> <edge> sourcename = node1 targetname = node2 </edge> </graph> # After parsing file and placing into %conf you will have %conf = ( node = { title = { node1 = { loc = { x => 10, y => 20, }, }, node2 = { loc = { x => 10, y => 20, }, }, }, }, edge = { sourcename => node1, targetname => node2, }, ) # You can then reference pieces thereof via $conf->{node}->{title}->{node1}->{loc}->{x}; [download] I might have messed up the indentation, as to how it builds the hashes, but I think the point is clear. Also if you do something like x = 10 20 30 40, when you reference conf->{x} it will return an array ref, the contents of which is 10, 20, 30, 40.. you can alter the delimiter, which is whitespace by default to something else, so on and so forth with all the perl yummy goodness. So my suggestion would be to grab the module, look over its parser and building routines, and then abstract it out as you see fit. It should be a reasonable exercise to port the hash of hash structure, to say a C struct. Best of luck and happy hacking /* And the Creator, against his better judgement, wrote man.c */	[reply] [d/l]
Re: Graph File Parsing by broquaint (Abbot) on Mar 10, 2003 at 14:40 UTC
Here's a basic parser for your example graph use strict; use Data::Dumper; use Regexp::Common; my $name = qr/[a-z0-9_]+/i; my $val = qr/\S+/; my $braces = qr/$RE{balanced}{-parens=>'{}'}/; my $node = qr/($name) \s+ ($braces)/x; my $assign = qr/($name) [=:] \s* ($val) (?:\s+ \| })/x; my $graph = <<TXT; graph { node { title: "node1" loc {x: 10 y: 20} } node { title: "node2" loc { x: 10 y: 20 } } edge { sourcename="node1" targetname="node2" } } TXT print Dumper( parse_graph($graph) ); sub parse_graph { my $graph = shift; my $tree = {}; pos($graph) = 0; PARSE: { if($graph =~ /\G {? \s* $assign/sgcx) { $tree->{$1} = $2; redo PARSE; } elsif($graph =~ /\G .*? $node/sgcx) { push @{ $tree->{$1} }, parse_graph($2); redo PARSE; } } return $tree; } __output__ $VAR1 = { 'graph' => [ { 'edge' => [ { 'sourcename' => '"node1"', 'targetname' => '"node2"' } ], 'node' => [ { 'title' => '"node1"', 'loc' => [ { 'x' => '10', 'y' => '20' } ] }, { 'title' => '"node2"', 'loc' => [ { 'x' => '10', 'y' => '20' } ] } ] } ] }; [download] Will probably need tweaking for your own preferences but hopefully it's a start :) HTH `_________ broquaint`	[reply] [d/l]
Re: Graph File Parsing by zby (Vicar) on Mar 10, 2003 at 14:03 UTC
If you don't mind slurping the whole file into memory you can do it with regexps. First divide the file into part containing nodes and another one containing edges. Than do a match similar to this: `while($nodepart =~ m/node\s* \{\s* title:\s* "([^"])"\s loc\s* \{\s* x:\s* (\d+)\s* y:\s* (\d+)\s* \}\s* \}\s/gx){ build_the_structure_with_captured_node($1,$2,$3); }` [download] And another one for the edges. Most of languages have some regexp library - so this can work. Update:* The s modifier was not needed - \s matches a newline without it.	[reply] [d/l]


Do you know where your variables are?
	PerlMonks