http://qs321.pair.com?node_id=1108688


in reply to Pattern matching exlusion

You don't specify which characters are valid as part of the identifier, so the following assumes that any letter or number is okay. From your code snippet, it looked as if you wanted to allow either upper or lower case matching as well:

my $formget = qr(10-[KQ][A-Z0-9]*(?!/))i; while (my $form_type = <DATA>) { if ($form_type =~ /^$formget/ ) { print "Valid: $form_type"; } else { print "Invalid: $form_type"; } } __DATA__ 10-K 10-KSB 10-K405 10-ksb405 10-Q 10-K/A 10-Q/A
Output:
Valid: 10-K Valid: 10-KSB Valid: 10-K405 Valid: 10-ksb405 Valid: 10-Q Invalid: 10-K/A Invalid: 10-Q/A

Check out the section titled "Look-Around Assertions" in perlre about using the (?! ...) construct to specify what's called a negative lookahead. That is, something which must not follow the current pattern.

Replies are listed 'Best First'.
Re^2: Pattern matching exlusion
by AnomalousMonk (Archbishop) on Nov 29, 2014 at 04:53 UTC
    my $formget = qr(10-[KQ][A-Z0-9]*(?!/))i;

    The problem with this regex when used in a  /^$formget/ match is that it allows strings like  10-KA/ to match: if a  / is found at the end of  10-KA/ the regex can backtrack to  10-K and look ahead to  A which is not a  / character! BrowserUk's approach below gets around this by using an end-of-string anchor assertion to make sure that only non-/ characters follow the [KQ]. If anchoring were not possible, another way would be to use an 'atomic'  (?>pattern) group which will not allow backtracking into it:

    c:\@Work\Perl\monks>perl -wMstrict -le "my @tests = qw[ 10-K 10-KSB 10-K405 10-KSB405 10-Q 10-K/B 10-KSB/ABC 10-K405/A 10-KSB405/A 10-Q/A 10-KA/ ]; ;; my $formget = qr{ (?i) 10- [KQ] [A-Z0-9]* (?! /) }xms; printf 'valid: '; /^$formget/ and printf qq{'$_' } for @tests; print ''; ;; my $formget2 = qr{ (?> (?i) 10- [KQ] [A-Z0-9]*) (?! /) }xms; printf 'valid2: '; /^$formget2/ and printf qq{'$_' } for @tests; " valid: '10-K' '10-KSB' '10-K405' '10-KSB405' '10-Q' '10-KSB/ABC' '10- +K405/A' '10-KSB405/A' '10-KA/' valid2: '10-K' '10-KSB' '10-K405' '10-KSB405' '10-Q'

      Thank you AnomalousMonk, that makes sense. I really appreciate your time and interest. Have a great weekend!
Re^2: Pattern matching exlusion
by wrkrbeee (Scribe) on Nov 29, 2014 at 02:54 UTC
    Hi Loops, Thank you for your help. Tell me if I am wrong concerning the interpretation of your expression: qr(10-KQA-Z0-9*(?!/))i; 1. You are saying find a string (QR) 2. which begins with 10-K or 10-Q KQ 3. followed by any letter from A-Z (A-Z] 4. followed by any number 0-9 [0-9} 5. excluding any string which terminates with slash *(?!/))i Is that right? If so, how would the expression change if I were looking for upper case letters only? Your insight is invaluable, and I am grateful! Rick

      The [A-Z0-9] part of the expression matches any letter or any number. And it's followed by an asterisk "*", which means zero or more instances. So "1B3" and "K4R" both match at that point in the pattern.

      To use upper case only, remove the trailing "i" from the initial $formget assignment.

        Thank you Loops, again I am grateful for your help. Could not do it without your insight. Thanks, and Happy Holidays! Rick