parsing multi-line output from a cli command

TASdvlper has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: parsing multi-line output from a cli command by bart (Canon) on Jan 23, 2004 at 20:18 UTC
So your summary comes second, and can be multiline. Hmm. Anyway, what distinguishes between the first and the next lines, is the leading whitespace on the latter. What I'd do is something like the following — which doesn't quite work according to your minimal spec, but is better, IMHO: my(@records, $ref); while(<DATA>) { chomp; if(my @fields = /^(\S+): \[(.+?)\] \[(.+?)\] \[(.+?)\] \[(.+?)\]\s +*$/) { my %record; @record{'ticket#', qw(customer status priority owner)} = @fiel +ds; push @records, $ref = \%record; } elsif(s/^\s+//) { if(defined $ref->{summary}) { $ref->{summary} .= "\n$_"; } else { $ref->{summary} = $_; } } else { warn "Oops: no match in: $_\n"; } } use Data::Dumper; print Dumper \@records; __DATA__ fabx-t160: [ggurudut] [UNAN/OWNR] [C2] [kelrod] navicli chglun -l 3 -name "newname" doesn't work fabx-t161: [dozone] [UNAN/OWNR] [C2] [dchoi] The GUI needs to hide the CPP SEs from the unimported list fabx-t162: [haurora] [UNAN/OWNR] [C1] [glade] Cisco hardware related bug :idprom error on cisco switch on loading + 0.1.5.5 salagent (This line won't match) [download] which produces the output (the first line is a warning, which goes to STDERR): Read more... (2 kB) Perhaps a tiny bit of explanation is in order. When the first line of a record is encountered, a fresh hash ref, a new record holding the data for the first line, is pushed onto the global memory array `@records`. At the same time, a reference to this latest record is kept in the variable `$ref`. We can use that ref to still modify the original record, even while it's already on the array. So I use it to append more summary lines to the hash item for 'summary'.	[reply] [d/l] [select]
Re: Re: parsing multi-line output from a cli command by Not_a_Number (Prior) on Jan 23, 2004 at 20:29 UTC
Re this bit: `elsif(s/^\s+//) { if(defined $ref->{summary}) { $ref->{summary} .= "\n$_"; } else { $ref->{summary} = $_; }` [download] You can reduce it to: `elsif(s/^\s+//) { $ref->{summary} .= "\n$_"; }` [download] dave	[reply] [d/l] [select]
Re: Re: Re: parsing multi-line output from a cli command by bart (Canon) on Jan 23, 2004 at 20:31 UTC
Except now your summary will always start with a newline. Which is the reason for my elaborate scheme. :)	[reply]
Re: Re: Re: Re: parsing multi-line output from a cli command by Not_a_Number (Prior) on Jan 23, 2004 at 21:13 UTC
Re: parsing multi-line output from a cli command by duff (Parson) on Jan 23, 2004 at 19:20 UTC
Here's some (untested) code: `my (@records, $cur_rec); while (<DATA>) { chomp; if (s/^\s+//) { $cur_rec->{'summary'} .= " $_"; next } my ($tick,$cust,$stat,$prio,$owner) = /^ ([\w-]+): \s* # ticket (\[.?\]) \s # customer (\[.?\]) \s # status (\[.?\]) \s # priority (\[.*?\]) # owner /x or next; $cur_rec = { 'ticket#' => $tick, 'customer' => $cust, 'status' => $stat, 'priority' => $prio, 'owner' => $owner, }; push @records, $cur_rec; }` [download] You'll want to do better validation I expect. There are also ways to minimize the redundancy but that's left as an exercise for the reader :-) duff	[reply] [d/l]
Re: parsing multi-line output from a cli command by Anonymous Monk on Jan 23, 2004 at 20:17 UTC
I'd probably use a hash of hashes, with the outer hash keyed by the ticket id: use strict; use warnings; my @fields = ( 'customer', 'status', 'priority', 'owner' ); my ( %HoH, $ticket ); while ( <DATA> ) { chomp; next if /^$/; # Better data validation recommended! if ( my @tmp = / ^(\S+): \s+\[([^\]]+)\] # square bracket, followed \s+\[([^\]]+)\] # by anything other than a \s+\[([^\]]+)\] # square bracket, followed \s+\[([^\]]+)\] # by a square bracket /x ) { $ticket = shift @tmp; $HoH{$ticket} = { map { $fields[$_] => $tmp[$_] } 0 .. $#tmp }; } else { s/^\s+//g; $HoH{$ticket}{comment} .= " $_"; } } foreach my $record ( keys %HoH ) { print "\n$record:\n"; print "$_ = $HoH{$record}{$_}\n" for keys %{ $HoH{$record} }; } __DATA__ fabx-t160: [ggurudut] [UNAN/OWNR] [C2] [kelrod] navicli chglun -l 3 -name "newname" doesn't work. Try a different name, retardo! fabx-t161: [dozone] [UNAN/OWNR] [C2] [dchoi] The GUI needs to hide the CPP SEs from the unimported list fabx-t162: [haurora] [UNAN/OWNR] [C1] [glade] Cisco hardware related bug :idprom error on cisco switch on loading + 0.1.5.5 salagent [download] (Code tested, but only on a limited data set) However, YMMV. dave	[reply] [d/l]
Re: parsing multi-line output from a cli command by hmerrill (Friar) on Jan 23, 2004 at 18:52 UTC
Beware: this code is completely untested. `my($ticket,$customer,$status,$priority,$owner,$summary); while (<FILE_HANDLE>) { chomp; if (substr($_, 0,1) == " ") { ### 1st char is space - line just be summary line ### my $summary = $_; ### since you now have all the pieces of data, you ### can build your hash and array element, and then ### move on to the next record. } else { ### 1st char is not a space - this is not a summary ### line. ($ticket,$customer,$status,$priority,$owner) = split /\w+/; }` [download] HTH.	[reply] [d/l]
Re: parsing multi-line output from a cli command by Cirollo (Friar) on Jan 23, 2004 at 19:29 UTC
This code matches the line of keys (ticket, customer, etc), and then continues stuffing lines into the summary field until it matches another line of keys. The part of the regex that matches the ticket # will probably need to be changed, and you might want to think about what kind of whitespace you'll end up with in your string when you have a multi-line summary. `#!/usr/bin/perl -w use strict; use Data::Dumper; my %record; my $summary_flag; while(<>) { print; if (/(fabx-t\d+\:) \[(\S+)\] \[(\S+)\] \[(\S+)\] \[(\S+)\]/) { # Print the previous record. print Dumper(\%record); # start a new record hash %record = ( 'ticket#' => $1, 'customer' => $2, 'status' => $3, 'priority' => $4, 'owner' => $5 ); } else { $record{summary} .= $_; } } # print the last record print Dumper(\%record);` [download]	[reply] [d/l]
Re: parsing multi-line output from a cli command by Vautrin (Hermit) on Jan 23, 2004 at 18:29 UTC
If the summary is always on a single line and is the second line you could always put the second line directly in the summary and the first line could be parsed for input. If not you will have to take that into account. What charachters are allowed to appear in the summary? If any charachters can appear and the summary can be multiple lines and it's on a second line because the code is wrapping you will have problems, because you can't split on charachters such as `[, ], or :`. If you have access to the code of the original CLI you may want to change the output a little bit. I would recommend outputting the results to XML if it's an option. That would make it much easier for you and whoever else needs to parse it to parse it. Otherwise you may need to get tricky if the worse case scenario for everything I've mentioned is true. You could split on the regular expression: `/^.: [.?\] \[.?\] \[.?\] \[.?\].$/` (I think, not tested). You'd then have all of the comments. Or you could read line by line and use that regular expressiong with the $`, $&, $' variables. (That's the thing before the match, the thing that was matched, and the thing after the match). Be warned though that once they're present in code they will be created for all regular expressions which can slow down your code. Just some thoughts, Vautrin	[reply] [d/l] [select]


"be consistent"
	PerlMonks