http://qs321.pair.com?node_id=929876


in reply to Regex matching after ASCII characters

$file =~ m{ \A [A-Z]{6} [A-Z0-9]{2} $prompt_host }xms; But it wouldn't be a bad idea to do some sanity checks on $prompt_host before randomly injecting user entered text into your regexp.

                - Ant
                - Some of my best work - (1 2 3)

Replies are listed 'Best First'.
Re^2: Regex matching after ASCII characters
by Anonymous Monk on Oct 05, 2011 at 21:11 UTC

    Thanks for your quick response, I tried your code and am getting the same false positives.

    Example: $prompt_host = "BR|GR", filenames such as: YUGRABCKFI01-1.1.1.1_2011.10.04.00.00.00.txt and GRREPCCOBE10-1.1.1.1_2011.10.04.00.00.00.txt are erroneously being pushed. Here is how I implemented your code:

    while (my $file = readdir(DIR)) { if($file =~ m{ \A [A-Z0-9]{8} $prompt_host }xms) { push(@traffic_file_list, $file); } }
      The problem with your code is the "BR|GR". Perl will now match any of these 2:
      \A [A-Z0-9]{8} BR
      OR:
      GR
      The second one matches. The solution would be to set $prompt_host to:
      $prompt_host = (?:BR|GR)

        Perfect, thank you guys for your help.

      And with some sanity checking:

      >perl -wMstrict -le "my @t = ( 'YUGRABCKFI01-1.1.1.1_2011.10.04.00.00.00.txt', 'GRREPCCOBE10-1.1.1.1_2011.10.04.00.00.00.txt', ); ;; ENTRY: for my $entry ('BR|RG', 'FI', '(?{ `rm -R *` })', '++') { my $rx = eval { qr{ \A [A-Z]{6} [A-Z\d]{2} (?: $entry) }xms }; if ($@) { print qq{user entered '$entry' is evil: $@}; next ENTRY; } for my $t (@t, @ARGV) { printf qq{%7s %-3smatch with '$t' \n}, qq{'$entry'}, $t =~ $rx ? '' : 'NO' ; } } " "STTLWA02RG01_2011-10-05.00.00.00.txt" 'BR|RG' NO match with 'YUGRABCKFI01-1.1.1.1_2011.10.04.00.00.00.txt' 'BR|RG' NO match with 'GRREPCCOBE10-1.1.1.1_2011.10.04.00.00.00.txt' 'BR|RG' match with 'STTLWA02RG01_2011-10-05.00.00.00.txt' 'FI' match with 'YUGRABCKFI01-1.1.1.1_2011.10.04.00.00.00.txt' 'FI' NO match with 'GRREPCCOBE10-1.1.1.1_2011.10.04.00.00.00.txt' 'FI' NO match with 'STTLWA02RG01_2011-10-05.00.00.00.txt' user entered '(?{ `rm -R *` })' is evil: Eval-group not allowed at runtime, use re 'eval' in regex m/ \A [A-Z]{6} [A-Z\d]{2} (?: (?{ `rm -R *` })) / at ... user entered '++' is evil: Quantifier follows nothing in regex; marked by <-- HERE in m/ \A [A-Z]{6} [A-Z\d]{2} (?: + <-- HERE +) / at ...