Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

String parsing

by hotshot (Prior)
on Jan 05, 2004 at 10:22 UTC ( [id://318803]=perlquestion: print w/replies, xml ) Need Help??

hotshot has asked for the wisdom of the Perl Monks concerning the following question:

Hello all!

I have a variable that can hold one of the followings strings:
"OFF" "SUCCESS: (abc)" "ERROR(1): disk number (27) crashed at (11:03)" "WARNING(1): system is rebooting"
These strings are output generated from a script I run. As you can see, the first word is the status itself, in the forst parenthesis (before the colon) is the exit status, that not always exist, and after the colon is a string with variables in parenthesis.
I need to extract the status, exit status (if exists), and variables (if exists), and print to display a new string with placeholders for the variables taken from the output above.
How can I easily extract all I need from the string outputed by the script (a regexp or something), I need at the end an array holding the status followed by the exit status in the first entry (for example: OFF, ERROR2, WARNING1), and the variables in the next entries, for example:
@neededResultEx1 = ('OFF'); @neededResultEx2 = ('SUCCESS', 'abc'); @neededResultEx3 = ('ERROR1', '27', '11:03');
Anyone?

Replies are listed 'Best First'.
Re: String parsing
by Abigail-II (Bishop) on Jan 05, 2004 at 10:54 UTC
    #!/usr/bin/perl use strict; use warnings; $" = ", "; while (<DATA>) { /^(\w+)(?:[(](\d+)[)])?:?/g or next; my ($status, $exit) = ($1, $2); my (@vars) = /\G[^(]*[(]([^)]*)[)]/g; print "Status: $status; "; print "Exit: $exit; " if defined $exit; print "Variables [@vars]" if @vars; print "\n"; } __DATA__ OFF SUCCESS: (abc) ERROR(1): disk number (27) crashed at (11:03) WARNING(1): system is rebooting Status: OFF; Status: SUCCESS; Variables [abc] Status: ERROR; Exit: 1; Variables [27, 11:03] Status: WARNING; Exit: 1;

    Abigail

Re: String parsing
by Zaxo (Archbishop) on Jan 05, 2004 at 10:54 UTC

    Your data lines are different enough that each type needs its own parser. A hash of coderefs ("dispatch table") will do that nicely,

    my $parser = { OFF => sub {()}, SUCCESS => sub { local $_ = shift; /\((\w*)\)/ }, # call these in list context! ERROR1 => sub { local $_ = shift; /disk number \((\d+)\) at \((\d+:\d+)\)/; }, WARNING1 => sub { local $_ = shift; /(\w.*)^/; } }; sub parse_line { local $_ = shift; my ($key, $data) = split ':', $_, 2; $key =~ tr/()//d; ($key, $parser->{$key}->($data)); }
    parse_line() should be called in list context, too.

    After Compline,
    Zaxo

      Your data lines are different enough that each type needs its own parser.
      Uhm, no, as shown in several other replies.
      A hash of coderefs ("dispatch table") will do that nicely,
      Actually, your solution is very inflexible. It can't even deal with:
      WARNING(2): system is rebooting
      (only the exit value is different from the original). It'll produce an error, as Perl will try to use an undefined value as a code reference.

      Abigail

Re: String parsing
by tachyon (Chancellor) on Jan 05, 2004 at 11:07 UTC

    Hard to do robustly with a regex

    while(<DATA>) { chomp; my ( @bits, $rest ); ( $bits[0], $rest ) = split ':', $_, 2; $bits[0] =~ tr/()//d; push @bits, ($rest =~ m!\(([^)]+)!g) if $rest; print "@bits\n"; } __DATA__ OFF SUCCESS: (abc) ERROR(1): disk number (27) crashed at (11:03) WARNING(1): system is rebooting

    Which gives you

    OFF SUCCESS abc ERROR1 27 11:03 WARNING1

    cheers

    tachyon

      Don't you just wish that repeat counts would repeat enclosed captures? Then you could use

      m[ ( ^ [^(:]+ ) (?: (?: \( ( \d+ ) \) )? : (?: .*? \( ( [^)]+ ) \) ){1,} .* )? $ ]x;
      to grab repetative elements instead of having to do (or rather not do) things with built in limits, like this.
      #! perl -slw use strict; while( <DATA> ) { chomp; my @bits = grep{ defined } m[ ( ^ [^(:]+ ) (?: (?: \( ( \d+ ) \) )? : (?: .*? \( ( [^)]+ ) \) (?: .*? \( ( [^)]+ ) \) (?: .*? \( ( [^)]+ ) \) (?: .*? \( ( [^)]+ ) \) )? )? )? )? .* )? $ ]x; print join'/', @bits; } =Output P:\test>junk OFF SUCCESS/abc ERROR/1/27/11:03 WARNING/1 TEST/255/this/that/the other/and this =cut __DATA__ OFF SUCCESS: (abc) ERROR(1): disk number (27) crashed at (11:03) WARNING(1): system is rebooting TEST(255): (this) (that) (the other) (and this) (but not this)

      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "Think for yourself!" - Abigail
      Hooray!

Re: String parsing
by ysth (Canon) on Jan 05, 2004 at 12:10 UTC
    my $tempstr = $str; # get rid of () around status $tempstr =~ s/^([^(:]+)\(([^)]+)\)/$1$2/; # grab the status and anything in parentheses @needed = $tempstr =~ /(^[^:]+|(?<=\()[^)]+(?=\)))/g;
    (It's really been quite a lookbehind kind of day :)

    Update: or do it the other way around:

    @needed = $_ =~ /(^[^:]+|(?<=\()[^)]+(?=\)))/g; $needed[0] =~ y/()//d;
Re: String parsing
by Hena (Friar) on Jan 05, 2004 at 10:57 UTC
    Well, wouldn't do it with one command. But this should work.
    # this is the original string $string=""; # then split on first ':' ($begin,$end)=split (/:/,$string,2); # remember every () separately # this assumes that between () there are no more (), eg not (27(b)) while ($end=~m/\((.+?)\)/g) { push (@result,$1); } # add first unshift (@result,$b); # if wanted to remove () from first # check if need to escape $result[0]=~s/[()]/;
    Update: fix typo.
      My solution is in the same realm.
      my($begin, $end) = split /:/, $string, 2; $begin =~ tr/()//d; @result = $begin; push @result, $end =~ m/\((.+?)\)/g;
Re: String parsing
by Anonymous Monk on Jan 05, 2004 at 10:54 UTC
    What have you tried so far? What exactly are you having trouble with (where is your brain block)?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://318803]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (6)
As of 2024-03-28 21:21 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found