Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Thank you in advance for answering my question, I have been searching all afternoon for the correct regex to use and I feel the need to elevate the issue.

I am searching through a series of files, the beginning of the filename is guaranteed to have the first 6 characters as A-Z, the next two characters will be any combo of A-Z,0-9. The following two characters are A-Z. It is these last two characters that I am trying to match. I am running into problems where I am matching characters accidentally from the first part of the filename. E.g. If if I were searching for TL, STTLWA02TL01_2011-10- would correctly match, STTLWA02HJ01_2011-10- would erroneously match.

So my question is: how can I match my desired characters after the 8 initial characters?

This regex is obviously insufficient, since if $prompt_host = "TL", it will match it anywhere in the filename.

#Populate @traffic_file_list while (my $file = readdir(DIR)) { if($file =~ $prompt_host) { push(@traffic_file_list, $file); } }

Thank you!

Replies are listed 'Best First'.
Re: Regex matching after ASCII characters
by suaveant (Parson) on Oct 05, 2011 at 20:31 UTC
    $file =~ m{ \A [A-Z]{6} [A-Z0-9]{2} $prompt_host }xms; But it wouldn't be a bad idea to do some sanity checks on $prompt_host before randomly injecting user entered text into your regexp.

                    - Ant
                    - Some of my best work - (1 2 3)

      Thanks for your quick response, I tried your code and am getting the same false positives.

      Example: $prompt_host = "BR|GR", filenames such as: YUGRABCKFI01- and GRREPCCOBE10- are erroneously being pushed. Here is how I implemented your code:

      while (my $file = readdir(DIR)) { if($file =~ m{ \A [A-Z0-9]{8} $prompt_host }xms) { push(@traffic_file_list, $file); } }
        The problem with your code is the "BR|GR". Perl will now match any of these 2:
        \A [A-Z0-9]{8} BR
        The second one matches. The solution would be to set $prompt_host to:
        $prompt_host = (?:BR|GR)

        And with some sanity checking:

        >perl -wMstrict -le "my @t = ( 'YUGRABCKFI01-', 'GRREPCCOBE10-', ); ;; ENTRY: for my $entry ('BR|RG', 'FI', '(?{ `rm -R *` })', '++') { my $rx = eval { qr{ \A [A-Z]{6} [A-Z\d]{2} (?: $entry) }xms }; if ($@) { print qq{user entered '$entry' is evil: $@}; next ENTRY; } for my $t (@t, @ARGV) { printf qq{%7s %-3smatch with '$t' \n}, qq{'$entry'}, $t =~ $rx ? '' : 'NO' ; } } " "STTLWA02RG01_2011-10-" 'BR|RG' NO match with 'YUGRABCKFI01-' 'BR|RG' NO match with 'GRREPCCOBE10-' 'BR|RG' match with 'STTLWA02RG01_2011-10-' 'FI' match with 'YUGRABCKFI01-' 'FI' NO match with 'GRREPCCOBE10-' 'FI' NO match with 'STTLWA02RG01_2011-10-' user entered '(?{ `rm -R *` })' is evil: Eval-group not allowed at runtime, use re 'eval' in regex m/ \A [A-Z]{6} [A-Z\d]{2} (?: (?{ `rm -R *` })) / at ... user entered '++' is evil: Quantifier follows nothing in regex; marked by <-- HERE in m/ \A [A-Z]{6} [A-Z\d]{2} (?: + <-- HERE +) / at ...