Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

In-place file filtering

by princepawn (Parson)
on Jul 24, 2003 at 19:23 UTC ( [id://277660]=perlquestion: print w/replies, xml ) Need Help??

princepawn has asked for the wisdom of the Perl Monks concerning the following question:

I have a series of tab-separated files which look like this:
1 7083 1 7530 1 ---------- 1 -7840 3 0 20 00000 4 00001 1 00007 3 00010 1 00011 1 00023 2 00044 3 00100 1 00101 1 00112 1 00120 1 00121 1 00149 1 00186 1 00193 1 00200 2 00202 4 00683 3 00725 3 00727 1 00731 1 00735 1 00738 1 00745 1 00749 1 00761 1 00777 2 00778 1 00784 3 00801 12 00802 3 00803 1 00820 4 00823

Because the second field represents a zip code, I want to kill any lines in a file which do not conform to zip code requirements. However, my perl one-liner attempts are failing. How can I get this one-liner to change the file so that only lines which match my filter stay? Here is my attempt:
[tbone@MDB zip-grok]$ perl -pi.bak -e '($count, $zip) = split /\t/; ch +omp $zip; warn "*$zip*"; next unless ( ($zip =~ /\d{5}/) and ($zip > +713) and ($zip < 99930) )' *.dat

Carter's compass: I know I'm on the right track when by deleting something, I'm adding functionality

edited by ybiC: balanced <readmore> tags

Replies are listed 'Best First'.
Re: In-place file filtering
by Paladin (Vicar) on Jul 24, 2003 at 19:30 UTC
    The -p command line argument puts the print statement in a continue block, which is run even if next jumps out early.

    Change the -p to -n and add an explicit print after the next unless ....

Re: In-place file filtering
by RMGir (Prior) on Jul 24, 2003 at 19:56 UTC
    One point others didn't mention: be _really_ careful when you're testing this kind of thing.

    It's way too easy to rerun a command twice and crunch your only backup, if you're relying on -i.bak to preserve your originals... (Yes, that's the voice of painful experience talking :))
    --
    Mike

Re: In-place file filtering
by skyknight (Hermit) on Jul 24, 2003 at 19:33 UTC

    How about something along the lines of...

    perl -i.bak -n -a -e 'print if $F[1] =~ /\d{5}/ and $F[1] > 713 and $F +[1] < 99930' *.dat

    You don't want the -p switch because you don't always want to print. Use the -n switch instead for the same implicit while loop, but without the implicit print.

Re: In-place file filtering
by dragonchild (Archbishop) on Jul 24, 2003 at 19:35 UTC
    warn "*$zip*"; next unless
    should be (note ; => ,)
    warn "*$zip*", next unless
    Also, you're not printing the good value back out. I would do something like:
    perl -pi.bak -e 'chomp;my($x,$y)=split/\t/;do{warn"*$y*\n";next}unless +$y=~/^\d{5}$/&&$y>713&&$y<99930;print"$x\t$y\n"' *.dat
    Note: This code is lighly tested! YMMV

    Update: Heh - really lightly tested. My code works if you change -p to -n. Ignore me. :-)

    ------
    We are the carpenters and bricklayers of the Information Age.

    Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.

    Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.

Re: In-place file filtering
by adrianh (Chancellor) on Jul 24, 2003 at 21:10 UTC

    Variation on the theme:

    perl -MRegexp::Common=zip -i.bak -n -a -e 'print if m/\t$RE{zip}{US}$/ +' *.dat
      from the docs of Regexp::Common::zip
      US zip codes consist of 5 digits, with an optional 4 digit extension

      But this allows zip codes which are not currently valid.

      Carter's compass: I know I'm on the right track when by deleting something, I'm adding functionality

        Fair point. Might be worth dropping a bug-report/patch to Abigail-II.

        Where do you get the info on which US codes are valid/invalid?

Re: In-place file filtering
by flounder99 (Friar) on Jul 24, 2003 at 20:28 UTC
    AMEN to what RMGir said. Anyway here is my stab at it.
    perl -ni.bak -e '/^\d+\t(\d{5})/&&$1>713&&$1<99930?print:warn $_' *.da +t

    --

    flounder

      I'd change that to:
      /^\d+\t(\d{5})\s*\n$/&&...
      The reason being that your code will pass 072300 as a valid zip code when it's obviously not.

      Now - I just thought of this: Only Zip5 will be allowed. Zip5+4 is not possible? If it is, change that regex to:

      /^\d+\t(\d{5})(-\d{4})?\s*\n$/&&...

      ------
      We are the carpenters and bricklayers of the Information Age.

      Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.

      Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://277660]
Approved by Thelonius
Front-paged by broquaint
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (4)
As of 2024-04-19 11:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found