Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Graph File Parsing

by arunhorne (Pilgrim)
on Mar 10, 2003 at 13:10 UTC ( [id://241714]=perlquestion: print w/replies, xml ) Need Help??

arunhorne has asked for the wisdom of the Perl Monks concerning the following question:

Hi

I need to build a parser for a file format shown below. I have decided to hand code it (as against constructing a parser with Lex/Yacc) as it is comparatively simple, but would be interested in any observations people could make on the best way to approach parsing this file. The file represents a graph of vertices and edges:

graph { node { title: "node1" loc {x: 10 y: 20} } node { title: "node2" loc { x: 10 y: 20 } } edge { sourcename="node1" targetname="node2" } }

As your can see from the example the file is well structured and I want to create a list of node structures and a list of edge structures from this file - node has three properties - x, y and title whilst edge just has two properties, sourcename and targetname which correspond to node titles. I am not interested in checking the file (or the graph) for validity -- these files are generated by another program and therefore are always valid.

Can anyone help me with suitable code to load this data from the file (preferably without using any Perl specific features -- I am prototyping in Perl but may have to implement this parser eventually in another language due to constraints imposed by others) or any suggestions about how to do what I want in the most simple (but clear) way possible?

Any help will be greatly appreciated, thanks in advance,

____________
Arun

Replies are listed 'Best First'.
Re: Graph File Parsing
by larsen (Parson) on Mar 10, 2003 at 13:55 UTC
    It seems you're trying to parse Graphviz files. Do you know of Graphviz CPAN modules?
Re: Graph File Parsing
by l2kashe (Deacon) on Mar 10, 2003 at 13:39 UTC
    A decent starting point might be looking at Config::General. Its a module which parses Apache conf style config files, and I think could get you off on the right start.

    Example file and how Config::General parses it.
    # File blah.conf <graph> <node> title = node1 <loc> x = 10 y = 20 </loc> </node> <node> title = node2 <loc> x = 10 y - 29 </loc> </node> <edge> sourcename = node1 targetname = node2 </edge> </graph> # After parsing file and placing into %conf you will have %conf = ( node = { title = { node1 = { loc = { x => 10, y => 20, }, }, node2 = { loc = { x => 10, y => 20, }, }, }, }, edge = { sourcename => node1, targetname => node2, }, ) # You can then reference pieces thereof via $conf->{node}->{title}->{node1}->{loc}->{x};
    I might have messed up the indentation, as to how it builds the hashes, but I think the point is clear. Also if you do something like x = 10 20 30 40, when you reference conf->{x} it will return an array ref, the contents of which is 10, 20, 30, 40.. you can alter the delimiter, which is whitespace by default to something else, so on and so forth with all the perl yummy goodness.

    So my suggestion would be to grab the module, look over its parser and building routines, and then abstract it out as you see fit. It should be a reasonable exercise to port the hash of hash structure, to say a C struct.

    Best of luck and happy hacking

    /* And the Creator, against his better judgement, wrote man.c */
Re: Graph File Parsing
by broquaint (Abbot) on Mar 10, 2003 at 14:40 UTC
    Here's a basic parser for your example graph
    use strict; use Data::Dumper; use Regexp::Common; my $name = qr/[a-z0-9_]+/i; my $val = qr/\S+/; my $braces = qr/$RE{balanced}{-parens=>'{}'}/; my $node = qr/($name) \s+ ($braces)/x; my $assign = qr/($name) [=:] \s* ($val) (?:\s+ | })/x; my $graph = <<TXT; graph { node { title: "node1" loc {x: 10 y: 20} } node { title: "node2" loc { x: 10 y: 20 } } edge { sourcename="node1" targetname="node2" } } TXT print Dumper( parse_graph($graph) ); sub parse_graph { my $graph = shift; my $tree = {}; pos($graph) = 0; PARSE: { if($graph =~ /\G {? \s* $assign/sgcx) { $tree->{$1} = $2; redo PARSE; } elsif($graph =~ /\G .*? $node/sgcx) { push @{ $tree->{$1} }, parse_graph($2); redo PARSE; } } return $tree; } __output__ $VAR1 = { 'graph' => [ { 'edge' => [ { 'sourcename' => '"node1"', 'targetname' => '"node2"' } ], 'node' => [ { 'title' => '"node1"', 'loc' => [ { 'x' => '10', 'y' => '20' } ] }, { 'title' => '"node2"', 'loc' => [ { 'x' => '10', 'y' => '20' } ] } ] } ] };
    Will probably need tweaking for your own preferences but hopefully it's a start :)
    HTH

    _________
    broquaint

Re: Graph File Parsing
by zby (Vicar) on Mar 10, 2003 at 14:03 UTC
    If you don't mind slurping the whole file into memory you can do it with regexps. First divide the file into part containing nodes and another one containing edges. Than do a match similar to this:
    while($nodepart =~ m/node\s* \{\s* title:\s* "([^"]*)"\s* loc\s* \{\s* x:\s* (\d+)\s* y:\s* (\d+)\s* \}\s* \}\s*/gx){ build_the_structure_with_captured_node($1,$2,$3); }
    And another one for the edges.

    Most of languages have some regexp library - so this can work.

    Update: The s modifier was not needed - \s matches a newline without it.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://241714]
Approved by Tomte
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (5)
As of 2024-04-24 08:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found