Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

text sorting question, kinda

by ybiC (Prior)
on Aug 03, 2000 at 00:29 UTC ( [id://25837]=perlquestion: print w/replies, xml ) Need Help??

ybiC has asked for the wisdom of the Perl Monks concerning the following question:

I need to automate parsing of a small (600 lines, 200KB) text log file.

No problem with simple regex's to remove unwanted text and lines, but what's tripping me up is partial re-ordering.   I want to move every line that contains blahblah to the top of the file, but the blahblah lines must still remain in their same sequence.

Is there a CPAN module for such things?   I dug around, but found no such beast.   I could sure use a nudge in the right direction, or even code examples.
    cheers,
    ybiC

UNORDERED(before): foo 1 zot foo 2 blahblah bar 1 zot bar 2 zot bat 1 blahblah bat 2 baz 1 baz 2 zot ORDERED(after): foo 2 blahblah bat 1 blahblah foo 1 zot bar 1 zot bar 2 zot bat 2 baz 1 baz 2 zot

Replies are listed 'Best First'.
Re: text sorting question
by plaid (Chaplain) on Aug 03, 2000 at 00:37 UTC
    The first way that comes to mind offhand would be to do something like
    my @top_lines; my @other_lines; while(<FILE>) { # whatever regex to strip out lines (/blahblah$/) ? push @top_lines, $_ : push @other_lines, $_; } print OUTFILE @top_lines; print OUTFILE @other_lines;
    This would work well provided that the file doesn't get too big, as the entire thing is going to have to be kept in memory. But, if it's about 200k as you say, that should work fine.
(Ovid) Re: text sorting question
by Ovid (Cardinal) on Aug 03, 2000 at 00:53 UTC
    For what it's worth, here's my take on the situation:
    #!/usr/bin/perl -w use strict; my (@data, $i); my @logfile = <DATA>; for (@logfile) { /blahblah/? splice @data, $i++, 0, $_:push @data, $_; } print @data; __DATA__ foo 1 zot foo 2 blahblah bar 1 zot bar 2 zot bat 1 blahblah bat 2 baz 1 baz 2 zot
    It works fine and only needs one array.

    Cheers,
    Ovid

    Update: Clearly, I am smoking crack. larsen pointed out that I kept an extra array. Here's what I meant to post:

    #!/usr/bin/perl -w use strict; my (@data, $i); for (<DATA>) { /blahblah/? splice @data, $i++, 0, $_:push @data, $_; } print @data; __DATA__ foo 1 zot foo 2 blahblah bar 1 zot bar 2 zot bat 1 blahblah bat 2 baz 1 baz 2 zot
    There. Only one array. (sigh)
      Yes, but there's @logfile that contains the entire file. And you have:
      $#logfile + $#data > $#toplines + $#otherlines


      see you
      Larsen
Re: text sorting question
by ferrency (Deacon) on Aug 03, 2000 at 01:03 UTC
    You can also get away with one array but only storing the non-matching lines, and printing out (or processing) the matching ones immediately. (slight rework of Plaid's code...)

    my @other_lines; while(<FILE>) { # whatever regex to strip out lines (/blahblah$/) ? print OUTFILE : push @other_lines, $_; } print OUTFILE @other_lines;
    This way, not only do you only use one array, but you don't even store All of the lines in it- it only stores the nonmatching lines.

    Alan

Re: text sorting question
by DrManhattan (Chaplain) on Aug 03, 2000 at 08:12 UTC
    Here's the second shortest one I could come up with:
    #!/usr/bin/perl -l print for map { $_->[0] } sort { $a->[1] cmp $b->[1] } map { [$_, ( /blahblah/ ? 0 : 1 ) . $_] } <>;

    It uses a Schwartzian Transform that compares the second elements of an array looking like this:

    ( ["bat 1 blahblah", "1bat 1 blahblah"], ["foo 2 blahblah", "1foo 2 blahblah"], ["bar 1 zot", "0bar 1 zot"], ["bar 2 zot", "0bar 2 zot"] )
    The lines that don't match the regex get prepended with a 0 and the ones that do match get a 1. That way the matching lines always win a cmp with the non-matching ones.

    Update: Fixed a typo in the array. I had written "0bat 1 blahblah" and "0foo 2 blahblah"

    Here's a shorter, faster one using substr instead of arrays:

    #!/usr/bin/perl -l print for map { substr($_, 1) } sort map { ( /blahblah/ ? 0 : 1 ) . $_ } <>;

    -Matt

Re: text sorting question
by Boogman (Scribe) on Aug 03, 2000 at 00:51 UTC
    If you don't want to be saving any of the elements in arrays or anything like that, you could always just make two runs through the file. The first would check for the 'blah blah' lines and write those to a temporary file. The second would append the remaining lines to the end. Sure it means you have to read through the file twice, but if you're worried about memory usage, this only holds on to one line at a time.
Fast simple approach
by gryng (Hermit) on Aug 03, 2000 at 02:15 UTC
    Just put the blah blahs in output file 1, and non blah blahs in output file 2. Then when you are done:
    `cat out1.txt out2.txt > final-output.txt;rm out[12].txt`
    You could use arrays if you don't mind the memory usage and feel dirty using temporary files :) . Don't know how clean this is supposed to be :)

    Ciao,
    Gryn

Re: text sorting question
by eLore (Hermit) on Aug 03, 2000 at 00:40 UTC
    I don't know if it's the most efficient, but what about pushing all of the "top of the list" items into one array, the rest into another, then reverse order prepending them onto item2?
    while(<INFILE>){ if (match string to move){ @to_move = push $1 }else @regular_order push $1 } } while(@to_move){ unshift @regular_order, $to_move[last] }
    Completely untested, no warranty offered or implied! Someone please do it better...

    UPDATE Plaid did it better.

Re: text sorting question
by eak (Monk) on Aug 03, 2000 at 00:58 UTC
    if you can get the data into an array of arrays, the following would work very nicely. There is probably a nicer way to do the loop, than a foreach though.
    #!/usr/bin/perl -w my @array = ( ['foo', 1, 'zot'], ['foo', 2, 'blahblah'], ['bar', 1, 'zot'], ['bar', 2, 'zot'], ['bat', 1, 'blahblah'], ['bat', 2, ''], ['baz', 1, ''], ['baz', 2, 'zot'], ); my @sorted; foreach my $array (@array){ ($array->[2] eq 'blahblah' ? unshift @sorted, $array : push @sorted, + $array); }
    --eric
      That's a nice example, but it reverses the order of the "blahblah" lines. That was something that ybiC was trying to avoid. I spotted that immediately because I made the same mistake at first :)

      Cheers,
      Ovid

Re: text sorting question
by ray (Initiate) on Aug 03, 2000 at 01:43 UTC
    The following will do the sort you want, after the log has been loaded into the array  @v
    my @sorted = map { $_->[2] } sort { ($a->[0])*($#v+1) + $a->[1] <=> ($b->[0])*($#v+1) ++ $b->[1] } map { [ !((split '\s+', $v[$_])[2] eq 'blahblah'), $_, $v +[$_] ] } 0 .. $#v;
    Later,
    Ray.
      Here is a slightly modified version of the above version, but using an '||' in the 'sort' block to make sure the lines are in the proper order.
      my @sorted = map { $_->[2] } sort{ $b->[0] <=> $a->[0] || $a->[1] <=> $b->[1] } map { [ (split '\s+', $file[$_])[2] eq 'blahblah', $_, $f +ile[$_] ] } 0 .. $#file;
RE: text sorting question, kinda (simple result)
by ybiC (Prior) on Aug 03, 2000 at 05:38 UTC
    Here's the snippet I ended up with - it's not much more than verbatim from plaid's answer.

    Ovid's answer looked interesting too, but I try not to take things from people on crack.   <big grin>

    Update: Thanks as well to other fine Monks who offered answers.
        cheers,
        ybiC

    #!/usr/bin/perl -w # parse a log file and move lines with important text to top # of file while keeping sequence within each of two sections: # important and not-so-important use strict; my $infile = '/dir/file.in'; my $outfile = '/dir/file.out'; my @important; my @normal; open IN, "$infile" or die "Couldn't open $infile"; open OUT, ">$outfile" or die "Couldn't open $outfile"; while (<IN>) { s/unwanted text//g; # strip unwanted text s/more unwanted text//g; # strip unwanted text s/^\s+//g; # remove empty lines (/important text/) ? push @important, $_ : push @normal, $_; } print OUT @important; print OUT @normal; close IN or die "Couldn't close $infile"; close OUT or die "Couldn't close $outfile"; # END
A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://25837]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others contemplating the Monastery: (4)
As of 2024-04-19 18:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found