Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re: Logic trouble parsing a formatted text file into hashes of hashes (of hashes, etc.)

by ambrus (Abbot)
on Oct 16, 2004 at 20:31 UTC ( [id://399813]=note: print w/replies, xml ) Need Help??


in reply to Logic trouble parsing a formatted text file into hashes of hashes (of hashes, etc.)

This is a tricky question, especially because some of the parenthisized entries in the input contain only a word, some both a word and :foo(...) pairs, some only :foo(...) pairs, so it's not obvious what data structure to use.

Here's my guess for interpreting it (you might want to tidy it a bit of course, like changing what's allowed in words and what's not, or adding my vars).

use Data::Dumper; $s = \%p; @s = (); while (<>) { while (/\G\s*(?:([-\ +w.]+)|:([-\w.]+)\s*\(|(\)))/gc) { if (defined($1)) { defined($$s{""}) + and die "parse error: two"; $$s{""} = $1; } elsif (defined($2)) { pu +sh @s, $s; $s = $$s{$2} = {}; } elsif (defined($3)) { @s or die "pars +e error: close"; $s = pop @s; } } /(\S.*)/g and die "parse error: jun +k: $1"; } $! and die "read error"; $s == \%p or die "parse error: ope +n"; print Dumper(\%p);

Update 2006 jun 2: this works for the examples in the node only, not the full example int appears.

  • Comment on Re: Logic trouble parsing a formatted text file into hashes of hashes (of hashes, etc.)
  • Select or Download Code

Replies are listed 'Best First'.
Re^2: Logic trouble parsing a formatted text file into hashes of hashes (of hashes, etc.)
by idnopheq (Chaplain) on Oct 16, 2004 at 21:32 UTC
    THX, ambrus!

    I'm running with chromatic's idea for the moment.

    I like yours as well at first glance, since it only requires Data::Dumper, which every perl-enabled machine has by dafault.

    THX
    --
    idnopheq
    Apply yourself to new problems without preparation, develop confidence in your ability to to meet situations as they arrise.

Re^2: Logic trouble parsing a formatted text file into hashes of hashes (of hashes, etc.)
by ambrus (Abbot) on Jul 30, 2006 at 22:36 UTC

    Here's a corrected version that can parse the long sample on your scratchpad (which I also copy below so that it wouldn't disappear unexpextedly). The original script (that in the parent thread) couldn't parse the longer sample because of features it had that I couldn't have guessed from the small samples on the node, and by the time I wrote that you didn't give us the long sample. There are three differences: firstly, this script expects that the data starts with an opening parenthesis, secondly, it accepts a lone colon instead of a colon with a keyword after it, thirdly, it accepts double-quoted strings.

    perl -we 'use Data::Dumper; $s = \%p; @s = (); while (<>) { our $f++ o +r $_ = ": " . $_; while (/\G\s*(?:([-\w.]+|"[^"]*")|:([-\w.]*)\s*\(|( +\)))/gc) { if (defined($1)) { defined($$s{""}) and die "parse error: +two"; $$s{"@"} = $1; } elsif (defined($2)) { push @s, $s; $s = $$s{$2 +} = {}; } elsif (defined($3)) { @s or die "parse error: close"; $s = +pop @s; } } /(\S.*)/g and die "parse error: junk: $1"; } $! and die " +read error"; $s == \%p or die "parse error: open"; print Dumper(\%p); +'

    A historical note. I did the correction because someone has asked on an irc channel how to parse a file of this exact format.

    Here's the long sample

    Update: a version of the above converted to a real script (not a one-liner using global variables) is here. This one also removes double-quotes from double-quoted strings and accepts multi-line strings. The file format has backslash-escaped double quotes in double-quoted strings it seems, and possibly other things this can't parse.

    use warnings; use strict; use Data::Dumper; sub parse { my($f) = @_; my($s, %p, @s, $b); $s = \%p; while (<$f>) { $b++ or $_ = ": " . $_; while (/\G\s*(?:([-\w.]+)|"([^"]*)"|("[^"]*$)|:([-\w.] +*)\s*\(|(\)))/gc) { if (defined($1) || defined($2)) { defined($$s{""}) and die "parse error: + two"; $$s{"@"} = defined($1) ? $1 : $2; } elsif (defined($3)) { $_ = $+ . <$f>; } elsif (defined($4)) { push @s, $s; $s = $$s{$+} = {}; } elsif (defined($5)) { @s or die "parse error: close"; $s = pop @s; } } /(\S.*)/g and die "parse error: junk: $1"; } $! and die "read error"; $s == \%p or die "parse error: open"; \%p; } my $p = parse(*ARGV); print Dumper($p); __END__

    Update: defined($$s{""}) and die "parse error: two"; shoud be changed to defined($$s{"@"}) and die "parse error: two"; in both scripts I belive.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://399813]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others contemplating the Monastery: (3)
As of 2024-04-25 22:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found