Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Regex matching after ASCII characters

by Anonymous Monk
on Oct 05, 2011 at 20:15 UTC ( #929873=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Thank you in advance for answering my question, I have been searching all afternoon for the correct regex to use and I feel the need to elevate the issue.

I am searching through a series of files, the beginning of the filename is guaranteed to have the first 6 characters as A-Z, the next two characters will be any combo of A-Z,0-9. The following two characters are A-Z. It is these last two characters that I am trying to match. I am running into problems where I am matching characters accidentally from the first part of the filename. E.g. If if I were searching for TL, STTLWA02TL01_2011-10-05.00.00.00.txt would correctly match, STTLWA02HJ01_2011-10-05.00.00.00.txt would erroneously match.

So my question is: how can I match my desired characters after the 8 initial characters?

This regex is obviously insufficient, since if $prompt_host = "TL", it will match it anywhere in the filename.

#Populate @traffic_file_list while (my $file = readdir(DIR)) { if($file =~ $prompt_host) { push(@traffic_file_list, $file); } }

Thank you!

Replies are listed 'Best First'.
Re: Regex matching after ASCII characters
by suaveant (Parson) on Oct 05, 2011 at 20:31 UTC
    $file =~ m{ \A [A-Z]{6} [A-Z0-9]{2} $prompt_host }xms; But it wouldn't be a bad idea to do some sanity checks on $prompt_host before randomly injecting user entered text into your regexp.

                    - Ant
                    - Some of my best work - (1 2 3)

      Thanks for your quick response, I tried your code and am getting the same false positives.

      Example: $prompt_host = "BR|GR", filenames such as: YUGRABCKFI01-1.1.1.1_2011.10.04.00.00.00.txt and GRREPCCOBE10-1.1.1.1_2011.10.04.00.00.00.txt are erroneously being pushed. Here is how I implemented your code:

      while (my $file = readdir(DIR)) { if($file =~ m{ \A [A-Z0-9]{8} $prompt_host }xms) { push(@traffic_file_list, $file); } }
        The problem with your code is the "BR|GR". Perl will now match any of these 2:
        \A [A-Z0-9]{8} BR
        OR:
        GR
        The second one matches. The solution would be to set $prompt_host to:
        $prompt_host = (?:BR|GR)

        And with some sanity checking:

        >perl -wMstrict -le "my @t = ( 'YUGRABCKFI01-1.1.1.1_2011.10.04.00.00.00.txt', 'GRREPCCOBE10-1.1.1.1_2011.10.04.00.00.00.txt', ); ;; ENTRY: for my $entry ('BR|RG', 'FI', '(?{ `rm -R *` })', '++') { my $rx = eval { qr{ \A [A-Z]{6} [A-Z\d]{2} (?: $entry) }xms }; if ($@) { print qq{user entered '$entry' is evil: $@}; next ENTRY; } for my $t (@t, @ARGV) { printf qq{%7s %-3smatch with '$t' \n}, qq{'$entry'}, $t =~ $rx ? '' : 'NO' ; } } " "STTLWA02RG01_2011-10-05.00.00.00.txt" 'BR|RG' NO match with 'YUGRABCKFI01-1.1.1.1_2011.10.04.00.00.00.txt' 'BR|RG' NO match with 'GRREPCCOBE10-1.1.1.1_2011.10.04.00.00.00.txt' 'BR|RG' match with 'STTLWA02RG01_2011-10-05.00.00.00.txt' 'FI' match with 'YUGRABCKFI01-1.1.1.1_2011.10.04.00.00.00.txt' 'FI' NO match with 'GRREPCCOBE10-1.1.1.1_2011.10.04.00.00.00.txt' 'FI' NO match with 'STTLWA02RG01_2011-10-05.00.00.00.txt' user entered '(?{ `rm -R *` })' is evil: Eval-group not allowed at runtime, use re 'eval' in regex m/ \A [A-Z]{6} [A-Z\d]{2} (?: (?{ `rm -R *` })) / at ... user entered '++' is evil: Quantifier follows nothing in regex; marked by <-- HERE in m/ \A [A-Z]{6} [A-Z\d]{2} (?: + <-- HERE +) / at ...

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://929873]
Approved by BrowserUk
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (2)
As of 2022-10-02 16:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My preferred way to holiday/vacation is:











    Results (11 votes). Check out past polls.

    Notices?