Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Parsing Files for the Interesting Bit

by Cody Pendant (Prior)
on Jul 05, 2004 at 00:17 UTC ( [id://371762]=perlquestion: print w/replies, xml ) Need Help??

Cody Pendant has asked for the wisdom of the Perl Monks concerning the following question:

I have, once again, a big text file which is mostly irrelevant except for the Interesting Bit. Say for instance it's
Line 1 blah blah blah Line 2 blah blah blah ... Line 1000 blah blah blah Line 1001 --start foo-- [lots of stuff I do need to process] Line 2000 --end foo--

So my probably not-very-programmerly instinct is to just do this:

my $interesting_bit = 0; while(<FILE>){ if(/--end foo--/){ $interesting_bit = 0; last; ## assuming only one interesting ## bit per file of course } if($interesting_bit == 1){ ## do my processing on the lines } if(/--start foo--/){ $interesting_bit = 1; # we've found the line which says # the next line needs to be processed } }

It works for me, but is that bad practice?

Doing it this way seems quick and straightforward once you've got the order sorted out, but rather clunky -- what do other monks think?



($_='kkvvttuubbooppuuiiffssqqffssmmiibbddllffss')
=~y~b-v~a-z~s; print

Replies are listed 'Best First'.
Re: Parsing Files for the Interesting Bit
by borisz (Canon) on Jul 05, 2004 at 00:24 UTC
    If it works, why change it. I whould do it with the .. operator.
    while(<FILE>){ if ( /^-- start foo/ .. /^-- end foo/ ){ # do what you like to do here } }
    Boris
Re: Parsing Files for the Interesting Bit
by Zaxo (Archbishop) on Jul 05, 2004 at 00:46 UTC

    Your markers look like they are fixed strings, so

    if( $_ eq "--end foo--\n"){ # . . . }
    seems like a better test. Leave off "\n" if you chomp it.

    A couple more approaches come to mind. If the file is not too big, you can abuse $/ to get the content in two gulps.

    my $interesting_bit; { # maybe open here . . . local $/ = '--start foo--' . "\n"; $interesting_bit = <FILE>; local $/ = '--end foo--' . "\n"; $interesting_bit = <FILE>; # . . . and close here }
    That could burden memory.

    A third way is similar to yours, but uses the flip-flop operator to condense the code.

    my $interesting_bit; { # maybe open here . . . local $_; while (<FILE>) { if ($_ eq "--start foo--\n" .. $_ eq "--end foo--\n" ) { $interesting_bit .= $_; } elsif ( $interesting_bit ) { last } } # . . . and close }
    This makes the same assumption about there being only one interesting bit. If the variable is populated while the flip-flop is false, we've run past the end marker and can quit reading.

    I just noticed that I've changed the $interesting_bit variable from a flag in your code to an accumulator for the content. If you just set it true in the lhs of the flip-flop and do your processing in place of my .= operation, all will be well.

    After Compline,
    Zaxo

Re: Parsing Files for the Interesting Bit
by dws (Chancellor) on Jul 05, 2004 at 01:03 UTC

    I might do something like:

    while ( <FILE> ) { if ( /--- start foo ---/ ) { # we're about to start processing while ( <FILE> ) { last if /--- end foo ---/; # here's a line of stuff to process } # we're done with this block } }

    This approach eliminates the conditional, at the risk of not noticing that a file has ended mid block. If that's liable to be a problem, it's a straighforward mod to catch it.

Re: Parsing Files for the Interesting Bit
by Cody Pendant (Prior) on Jul 06, 2004 at 21:12 UTC
    Thank you all for your help.

    I just wanted to note for posterity that my system works for the situation I was in, namely the "interesting bit start" means "interesting bit starts next line".

    The
    if(/start interesting/ .. /end interesting/){ print; }
    version prints the "start interesting" lines themselves, whereas mine starts printing with the line after "start interesting" and stops before printing "end interesting".


    ($_='kkvvttuubbooppuuiiffssqqffssmmiibbddllffss')
    =~y~b-v~a-z~s; print

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://371762]
Approved by PERLscienceman
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (3)
As of 2024-04-26 06:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found