Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

parsing multi-line output from a cli command

by TASdvlper (Monk)
on Jan 23, 2004 at 17:56 UTC ( [id://323636]=perlquestion: print w/replies, xml ) Need Help??

TASdvlper has asked for the wisdom of the Perl Monks concerning the following question:

All,

I have a cli command and the output looks like the following:

fabx-t160: [ggurudut] [UNAN/OWNR] [C2] [kelrod] navicli chglun -l 3 -name "newname" doesn't work fabx-t161: [dozone] [UNAN/OWNR] [C2] [dchoi] The GUI needs to hide the CPP SEs from the unimported list fabx-t162: [haurora] [UNAN/OWNR] [C1] [glade] Cisco hardware related bug :idprom error on cisco switch on loading + 0.1.5.5 salagent
The 1st line has ticket#, customer, status, priority,owner and the second line is a summary.

I would like parse this output and build a array of hashes with the keys mentioned above. For example from the last record above (syntax may be wrong);

$myarray[0] = ( 'ticket#' => 'fabx-t162:', 'customer' => '[haurora], 'status' => '[UNAN/OWNR]', 'priority' => '[C1]', 'owner' => '[glade]', 'summary' => 'Cisco hardware related bug :idprom error + on cisco switch on loading 0.1.5.5 salagent' );
I've never parsed output where I'm trying to gather information from multiple lines. I can't really split on spaces because the summary could have spaces. So, I'm a little lost on how I get started.

Any thoughts on this ?

Thanks a bunch all.

Replies are listed 'Best First'.
Re: parsing multi-line output from a cli command
by bart (Canon) on Jan 23, 2004 at 20:18 UTC
    So your summary comes second, and can be multiline. Hmm. Anyway, what distinguishes between the first and the next lines, is the leading whitespace on the latter.

    What I'd do is something like the following — which doesn't quite work according to your minimal spec, but is better, IMHO:

    my(@records, $ref); while(<DATA>) { chomp; if(my @fields = /^(\S+): \[(.+?)\] \[(.+?)\] \[(.+?)\] \[(.+?)\]\s +*$/) { my %record; @record{'ticket#', qw(customer status priority owner)} = @fiel +ds; push @records, $ref = \%record; } elsif(s/^\s+//) { if(defined $ref->{summary}) { $ref->{summary} .= "\n$_"; } else { $ref->{summary} = $_; } } else { warn "Oops: no match in: $_\n"; } } use Data::Dumper; print Dumper \@records; __DATA__ fabx-t160: [ggurudut] [UNAN/OWNR] [C2] [kelrod] navicli chglun -l 3 -name "newname" doesn't work fabx-t161: [dozone] [UNAN/OWNR] [C2] [dchoi] The GUI needs to hide the CPP SEs from the unimported list fabx-t162: [haurora] [UNAN/OWNR] [C1] [glade] Cisco hardware related bug :idprom error on cisco switch on loading + 0.1.5.5 salagent (This line won't match)
    which produces the output (the first line is a warning, which goes to STDERR):

    Perhaps a tiny bit of explanation is in order. When the first line of a record is encountered, a fresh hash ref, a new record holding the data for the first line, is pushed onto the global memory array @records. At the same time, a reference to this latest record is kept in the variable $ref. We can use that ref to still modify the original record, even while it's already on the array. So I use it to append more summary lines to the hash item for 'summary'.

      Re this bit:

      elsif(s/^\s+//) { if(defined $ref->{summary}) { $ref->{summary} .= "\n$_"; } else { $ref->{summary} = $_; }

      You can reduce it to:

      elsif(s/^\s+//) { $ref->{summary} .= "\n$_"; }

      dave

        Except now your summary will always start with a newline. Which is the reason for my elaborate scheme. :)
Re: parsing multi-line output from a cli command
by duff (Parson) on Jan 23, 2004 at 19:20 UTC

    Here's some (untested) code:

    my (@records, $cur_rec); while (<DATA>) { chomp; if (s/^\s+//) { $cur_rec->{'summary'} .= " $_"; next } my ($tick,$cust,$stat,$prio,$owner) = /^ ([\w-]+): \s* # ticket (\[.*?\]) \s* # customer (\[.*?\]) \s* # status (\[.*?\]) \s* # priority (\[.*?\]) # owner /x or next; $cur_rec = { 'ticket#' => $tick, 'customer' => $cust, 'status' => $stat, 'priority' => $prio, 'owner' => $owner, }; push @records, $cur_rec; }

    You'll want to do better validation I expect. There are also ways to minimize the redundancy but that's left as an exercise for the reader :-)

Re: parsing multi-line output from a cli command
by Anonymous Monk on Jan 23, 2004 at 20:17 UTC

    I'd probably use a hash of hashes, with the outer hash keyed by the ticket id:

    use strict; use warnings; my @fields = ( 'customer', 'status', 'priority', 'owner' ); my ( %HoH, $ticket ); while ( <DATA> ) { chomp; next if /^$/; # Better data validation recommended! if ( my @tmp = / ^(\S+): \s+\[([^\]]+)\] # square bracket, followed \s+\[([^\]]+)\] # by anything other than a \s+\[([^\]]+)\] # square bracket, followed \s+\[([^\]]+)\] # by a square bracket /x ) { $ticket = shift @tmp; $HoH{$ticket} = { map { $fields[$_] => $tmp[$_] } 0 .. $#tmp }; } else { s/^\s+//g; $HoH{$ticket}{comment} .= " $_"; } } foreach my $record ( keys %HoH ) { print "\n$record:\n"; print "$_ = $HoH{$record}{$_}\n" for keys %{ $HoH{$record} }; } __DATA__ fabx-t160: [ggurudut] [UNAN/OWNR] [C2] [kelrod] navicli chglun -l 3 -name "newname" doesn't work. Try a different name, retardo! fabx-t161: [dozone] [UNAN/OWNR] [C2] [dchoi] The GUI needs to hide the CPP SEs from the unimported list fabx-t162: [haurora] [UNAN/OWNR] [C1] [glade] Cisco hardware related bug :idprom error on cisco switch on loading + 0.1.5.5 salagent

    (Code tested, but only on a limited data set)

    However, YMMV.

    dave

Re: parsing multi-line output from a cli command
by hmerrill (Friar) on Jan 23, 2004 at 18:52 UTC
    Beware: this code is completely untested.
    my($ticket,$customer,$status,$priority,$owner,$summary); while (<FILE_HANDLE>) { chomp; if (substr($_, 0,1) == " ") { ### 1st char is space - line just be summary line ### my $summary = $_; ### since you now have all the pieces of data, you ### can build your hash and array element, and then ### move on to the next record. } else { ### 1st char is not a space - this is not a summary ### line. ($ticket,$customer,$status,$priority,$owner) = split /\w+/; }
    HTH.
Re: parsing multi-line output from a cli command
by Cirollo (Friar) on Jan 23, 2004 at 19:29 UTC
    This code matches the line of keys (ticket, customer, etc), and then continues stuffing lines into the summary field until it matches another line of keys. The part of the regex that matches the ticket # will probably need to be changed, and you might want to think about what kind of whitespace you'll end up with in your string when you have a multi-line summary.
    #!/usr/bin/perl -w use strict; use Data::Dumper; my %record; my $summary_flag; while(<>) { print; if (/(fabx-t\d+\:) \[(\S+)\] \[(\S+)\] \[(\S+)\] \[(\S+)\]/) { # Print the previous record. print Dumper(\%record); # start a new record hash %record = ( 'ticket#' => $1, 'customer' => $2, 'status' => $3, 'priority' => $4, 'owner' => $5 ); } else { $record{summary} .= $_; } } # print the last record print Dumper(\%record);
Re: parsing multi-line output from a cli command
by Vautrin (Hermit) on Jan 23, 2004 at 18:29 UTC

    If the summary is always on a single line and is the second line you could always put the second line directly in the summary and the first line could be parsed for input. If not you will have to take that into account.

    What charachters are allowed to appear in the summary? If any charachters can appear and the summary can be multiple lines and it's on a second line because the code is wrapping you will have problems, because you can't split on charachters such as [, ], or :. If you have access to the code of the original CLI you may want to change the output a little bit. I would recommend outputting the results to XML if it's an option. That would make it much easier for you and whoever else needs to parse it to parse it.

    Otherwise you may need to get tricky if the worse case scenario for everything I've mentioned is true. You could split on the regular expression: /^.*: [.*?\] \[.*?\] \[.*?\] \[.*?\].*$/ (I think, not tested). You'd then have all of the comments. Or you could read line by line and use that regular expressiong with the $`, $&, $' variables. (That's the thing before the match, the thing that was matched, and the thing after the match). Be warned though that once they're present in code they will be created for all regular expressions which can slow down your code.

    Just some thoughts,

    Vautrin

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://323636]
Approved by rozallin
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (2)
As of 2024-04-20 04:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found