Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re: Perl backticks and GREP?

by jakobi (Pilgrim)
on Oct 19, 2009 at 17:12 UTC ( [id://802037]=note: print w/replies, xml ) Need Help??


in reply to Perl backticks and GREP?

See also my recent Re: *SAFE* use of string in system command for safe use of filenames.

Use %ENV and the old standby of adding /dev/null to force grep to print filename**s**. That way names containing quotes and worse won't create a mess.

# note that a simple pure perl version using 3arg open and # e.g. $count=$contents=~s/$argument/$&/g will have # about the same code length; besides offering way more # **cute** regexes... $ENV{file}=$file; # protect filename from shell interpolation $ENV{argument}=$argument; # also protect non-trivial regexes from shel +l! # upd: either -H OR /dev/null, sigh. a very telling misread of mine $results=`grep -c -H "\$argument" "\$file"`; # /dev/null`; # note \$ i +nstead of \"

Above is as usual incomplete and a bold lie: the reporting with still suffer in case of an embedded but unfiltered \n being a part of the filename (UTF-8 probably also offers some more interesting whitespace to play with; names with terminal esc sequences are also fun in print, warn & die). But especially \n really warants IMAO a cronjob and a shoot-and-kill policy for both file and file creator upto and including layer 8.

cu
Peter

Update:

1. of course, this (unnecessary) last minute addition of /dev/null belongs before the closing `-quote, just as your first parsing error indicated. Sorry & fixed above.

2. Oops: you need to move down the $ENV{file} (in your example below) into the first foreach loop. The $ENV{argument} similar after $argument has been assigned to, into the second loop, or better yet, trading efficiency to readability, both just before the grep.

3. ok, compare your version with the full working one here and do check the comments:

#!/usr/bin/perl -w #this file is for grepping within vdx files, and outputs the following +: #filename:<#of hits>:<arguments> #filename:2:myargument open ARGUMENTS, "<","file"; @files = <*.vdx>; @arguments = <ARGUMENTS>; foreach $file (@files) { #$copyfile="\"".$file."\""; foreach $argument (@arguments) { chomp $argument; #print "$copyfile\n"; # 0. move these _down_ to here ($ENV{f},$ENV{a})=($file,$argument); # 1. consider egrep instead for a bit more readable posix regexes # 2. two ways to make grep include file names: the newish -H or # the classic /dev/null trick (the last second change was also a la +st # check misread your opening to imply missing _file_ argument :/) $results=`grep -H -c "\$a" "\$f"`; # /dev/null`; # 3. do you really want to see non-match reports? $rc=$?; next if $rc; # nothing to report # 4. but gnu grep ... /dev/null now insists to sumarize non hits as we +ll # $results=~s/\n.*//sgo; chomp $results; print "$results:$argument\n"; } }

Replies are listed 'Best First'.
Re^2: Perl backticks and GREP?
by symgryph (Sexton) on Oct 19, 2009 at 18:06 UTC
    #!/usr/bin/perl -w #this file is for grepping within vdx files, and outputs the following +: #filename:<#of hits>:<arguments> #filename:2:myargument open ARGUMENTS, "<","file"; @files = <*.vdx>; @arguments = <ARGUMENTS>; $ENV{file}=$file; $ENV{argument}=$argument; foreach $file (@files) { #$copyfile="\"".$file."\""; foreach $argument (@arguments) { chomp $argument; #print "$copyfile\n"; $results=`grep -c -H "\$argument" "\$file"` /dev/null; chomp $results; print "$results:$arguments\n"; } }

    results in the following output:

    Unquoted string "dev" may clash with future reserved word at ./nelson. +pl line 15. Unquoted string "null" may clash with future reserved word at ./nelson +.pl line 15. Use of uninitialized value in scalar assignment at ./nelson.pl line 8, + <ARGUMENTS> line 1. Use of uninitialized value in scalar assignment at ./nelson.pl line 9, + <ARGUMENTS> line 1. grep: : No such file or directory Argument "dev" isn't numeric in division (/) at ./nelson.pl line 15, < +ARGUMENTS> line 1. Argument "" isn't numeric in division (/) at ./nelson.pl line 15, <ARG +UMENTS> line 1. Illegal division by zero at ./nelson.pl line 15, <ARGUMENTS> line 1.
    "Two Wheels good, Four wheels bad."
      Unquoted string "dev" may clash with future reserved word at ./nelson. +pl line 15. Unquoted string "null" may clash with future reserved word at ./nelson +.pl line 15. ... Argument "dev" isn't numeric in division (/) at ./nelson.pl line 15, < +ARGUMENTS> line 1. Argument "" isn't numeric in division (/) at ./nelson.pl line 15, <ARG +UMENTS> line 1. Illegal division by zero at ./nelson.pl line 15, <ARGUMENTS> line 1.

      All of the above are generated by the /dev/null you have marooned outside the backticks. Are you attempting to redirect the output of grep?

      Use of uninitialized value in scalar assignment at ./nelson.pl line 8, + <ARGUMENTS> line 1. Use of uninitialized value in scalar assignment at ./nelson.pl line 9, + <ARGUMENTS> line 1.
      You don't seem to have initialised $file or $argument anywhere before trying to use them.

      grep: : No such file or directory

      I don't think you should be escaping the variables inside the backticks. They need to interpolate.

      I hope these points help you move forward.

      Cheers,

      JohnGG

        I'm the misleading culprit here wrt the /dev/null-quoting and associated error messages - check the update in the 2nd half of my initial comment 3 levels up; partially due to the fact that I never use -H and thus managed to misread something just before posting.

        /dev/null and quoting: Marooned it may be, redirect it isn't :)

        > I don't think you should be escaping the variables inside the backticks. They need to interpolate.

        Wrong.

        <insert pet-peeve alert spoiler warning>
        (the following is a bit Unixish, but when I see grep, I think I'm safe enough to assume cygwin or better)

        NOT INTERPOLATING is exactly the idea here to make the grep invocation secure regardless of what kind of stupid filenames or interesting regexes appear:

        Think about the full set of (pathological) filenames that can be matched with e.g. <*.vdx>;, then about regexes, and finally:

        Place the variable into the environment and push the variable interpolation into the shell. The shell now interpolates the variables, but the quotes stop the shell from doing word splitting or worse. Better yet, by not interpolating the variable into the commandline with Perl before exec'ing the shell due to system("grep ..."), the shell cannot see the characters in the filename/regex as shell special characters to act upon.

        If you think you can protect your command arguments with just a set of quotes around filename or regex, do think about quotes, newlines(!yes!) and semicolons in the filename. Now puzzle out what the shell actually sees as commandline and as command arguments. If still in doubt, check out the link I mentioned at the top of my previous comment for the bigger picture.

        And if you think your filenames are well-controlled, sane and thus don't require caution, you've still to take care of the POSIX basic regex argument, in case the grep should ever match more than just alphanumerics.

        cu, Peter

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://802037]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others contemplating the Monastery: (3)
As of 2024-04-19 20:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found