ybiC has asked for the wisdom of the Perl Monks concerning the following question:
I need to automate parsing of a small (600 lines, 200KB) text log file.
No problem with simple regex's to remove unwanted text and lines, but what's tripping me up is partial re-ordering. I want to move every line that contains blahblah to the top of the file, but the blahblah lines must still remain in their same sequence.
Is there a CPAN module for such things? I dug around, but found no such beast. I could sure use a nudge in the right direction, or even code examples.
cheers,
ybiC
UNORDERED(before):
foo 1 zot
foo 2 blahblah
bar 1 zot
bar 2 zot
bat 1 blahblah
bat 2
baz 1
baz 2 zot
ORDERED(after):
foo 2 blahblah
bat 1 blahblah
foo 1 zot
bar 1 zot
bar 2 zot
bat 2
baz 1
baz 2 zot
Re: text sorting question
by plaid (Chaplain) on Aug 03, 2000 at 00:37 UTC
|
The first way that comes to mind offhand would be to do
something like
my @top_lines;
my @other_lines;
while(<FILE>) {
# whatever regex to strip out lines
(/blahblah$/) ? push @top_lines, $_ : push @other_lines, $_;
}
print OUTFILE @top_lines;
print OUTFILE @other_lines;
This would work well provided that the file doesn't get
too big, as the entire thing is going to have to be kept
in memory. But, if it's about 200k as you say, that
should work fine.
| [reply] [d/l] |
(Ovid) Re: text sorting question
by Ovid (Cardinal) on Aug 03, 2000 at 00:53 UTC
|
For what it's worth, here's my take on the situation:
#!/usr/bin/perl -w
use strict;
my (@data, $i);
my @logfile = <DATA>;
for (@logfile) {
/blahblah/? splice @data, $i++, 0, $_:push @data, $_;
}
print @data;
__DATA__
foo 1 zot
foo 2 blahblah
bar 1 zot
bar 2 zot
bat 1 blahblah
bat 2
baz 1
baz 2 zot
It works fine and only needs one array.
Cheers,
Ovid
Update: Clearly, I am smoking crack. larsen pointed out that I kept an extra array. Here's what I meant to post:
#!/usr/bin/perl -w
use strict;
my (@data, $i);
for (<DATA>) {
/blahblah/? splice @data, $i++, 0, $_:push @data, $_;
}
print @data;
__DATA__
foo 1 zot
foo 2 blahblah
bar 1 zot
bar 2 zot
bat 1 blahblah
bat 2
baz 1
baz 2 zot
There. Only one array. (sigh) | [reply] [d/l] [select] |
|
Yes, but there's @logfile that contains the entire file.
And you have:
$#logfile + $#data > $#toplines + $#otherlines
see you
Larsen | [reply] [d/l] |
Re: text sorting question
by ferrency (Deacon) on Aug 03, 2000 at 01:03 UTC
|
You can also get away with one array but only storing the
non-matching lines, and printing out (or processing)
the matching ones immediately. (slight rework of Plaid's
code...)
my @other_lines;
while(<FILE>) {
# whatever regex to strip out lines
(/blahblah$/) ? print OUTFILE : push @other_lines, $_;
}
print OUTFILE @other_lines;
This way, not only do you only use one array, but you don't even store All of the lines in it- it only stores the nonmatching lines.
Alan | [reply] [d/l] |
Re: text sorting question
by DrManhattan (Chaplain) on Aug 03, 2000 at 08:12 UTC
|
Here's the second shortest one I could come up with:
#!/usr/bin/perl -l
print for map { $_->[0] }
sort { $a->[1] cmp $b->[1] }
map { [$_, ( /blahblah/ ? 0 : 1 ) . $_] }
<>;
It uses a Schwartzian Transform that compares the second
elements of an array looking like this:
(
["bat 1 blahblah", "1bat 1 blahblah"],
["foo 2 blahblah", "1foo 2 blahblah"],
["bar 1 zot", "0bar 1 zot"],
["bar 2 zot", "0bar 2 zot"]
)
The lines that don't match the regex get prepended with a 0
and the ones that do match get a 1. That way the matching
lines always win a cmp with the non-matching ones.
Update: Fixed a typo in the array. I had written
"0bat 1 blahblah" and "0foo 2 blahblah"
Here's a shorter, faster one using substr instead of arrays:
#!/usr/bin/perl -l
print for map { substr($_, 1) }
sort
map { ( /blahblah/ ? 0 : 1 ) . $_ }
<>;
-Matt | [reply] [d/l] [select] |
Re: text sorting question
by Boogman (Scribe) on Aug 03, 2000 at 00:51 UTC
|
If you don't want to be saving any of the elements in arrays
or anything like that, you could always just make two runs
through the file. The first would check for the 'blah blah' lines
and write those to a temporary file. The second would append
the remaining lines to the end. Sure it means you have to read
through the file twice, but if you're worried about memory usage,
this only holds on to one line at a time. | [reply] |
Fast simple approach
by gryng (Hermit) on Aug 03, 2000 at 02:15 UTC
|
Just put the blah blahs in output file 1, and non blah blahs in output file 2. Then when you are done:
`cat out1.txt out2.txt > final-output.txt;rm out[12].txt`
You could use arrays if you don't mind the memory usage and feel dirty using temporary files :) . Don't know how clean this is supposed to be :)
Ciao,
Gryn | [reply] [d/l] |
Re: text sorting question
by eLore (Hermit) on Aug 03, 2000 at 00:40 UTC
|
I don't know if it's the most efficient, but what about pushing all of the "top of the list" items into one array, the rest into another, then reverse order prepending them onto item2?
while(<INFILE>){
if (match string to move){
@to_move = push $1
}else
@regular_order push $1
}
}
while(@to_move){
unshift @regular_order, $to_move[last]
}
Completely untested, no warranty offered or implied! Someone please do it better...
UPDATE
Plaid did it better. | [reply] [d/l] |
Re: text sorting question
by eak (Monk) on Aug 03, 2000 at 00:58 UTC
|
if you can get the data into an array of arrays, the following would work very nicely.
There is probably a nicer way to do the loop, than a foreach though.
#!/usr/bin/perl -w
my @array = (
['foo', 1, 'zot'],
['foo', 2, 'blahblah'],
['bar', 1, 'zot'],
['bar', 2, 'zot'],
['bat', 1, 'blahblah'],
['bat', 2, ''],
['baz', 1, ''],
['baz', 2, 'zot'],
);
my @sorted;
foreach my $array (@array){
($array->[2] eq 'blahblah' ? unshift @sorted, $array : push @sorted,
+ $array);
}
--eric | [reply] [d/l] |
|
That's a nice example, but it reverses the order of the "blahblah" lines. That was something that ybiC was trying to avoid. I spotted that immediately because I made the same mistake at first :)
Cheers,
Ovid
| [reply] |
Re: text sorting question
by ray (Initiate) on Aug 03, 2000 at 01:43 UTC
|
The following will do the sort you want, after the log has
been loaded into the array @v
my @sorted = map { $_->[2] }
sort { ($a->[0])*($#v+1) + $a->[1] <=> ($b->[0])*($#v+1)
++ $b->[1] }
map { [ !((split '\s+', $v[$_])[2] eq 'blahblah'), $_, $v
+[$_] ] }
0 .. $#v;
Later,
Ray. | [reply] [d/l] [select] |
|
Here is a slightly modified version of the above version, but using an '||' in the 'sort' block to make sure the lines are in the proper order.
my @sorted = map { $_->[2] }
sort{ $b->[0] <=> $a->[0] || $a->[1] <=> $b->[1] }
map { [ (split '\s+', $file[$_])[2] eq 'blahblah', $_, $f
+ile[$_] ] }
0 .. $#file;
| [reply] [d/l] |
RE: text sorting question, kinda (simple result)
by ybiC (Prior) on Aug 03, 2000 at 05:38 UTC
|
Here's the snippet I ended up with - it's not much more than verbatim from plaid's answer.
Ovid's answer looked interesting too, but I try not to take things from people on crack. <big grin>
Update: Thanks as well to other fine Monks who offered answers.
cheers,
ybiC
#!/usr/bin/perl -w
# parse a log file and move lines with important text to top
# of file while keeping sequence within each of two sections:
# important and not-so-important
use strict;
my $infile = '/dir/file.in';
my $outfile = '/dir/file.out';
my @important;
my @normal;
open IN, "$infile" or die "Couldn't open $infile";
open OUT, ">$outfile" or die "Couldn't open $outfile";
while (<IN>) {
s/unwanted text//g; # strip unwanted text
s/more unwanted text//g; # strip unwanted text
s/^\s+//g; # remove empty lines
(/important text/) ? push @important, $_ : push @normal, $_;
}
print OUT @important;
print OUT @normal;
close IN or die "Couldn't close $infile";
close OUT or die "Couldn't close $outfile";
# END
| [reply] [d/l] |
A reply falls below the community's threshold of quality. You may see it by logging in. |
|
|