Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Pattern matching exlusion

by wrkrbeee (Scribe)
on Nov 29, 2014 at 02:30 UTC ( [id://1108687]=perlquestion: print w/replies, xml ) Need Help??

wrkrbeee has asked for the wisdom of the Perl Monks concerning the following question:

Hi everyone, Looking for a fairly simple pattern matching expression, but I am coming up short. Would like to enhance the following IF expression:
if ($form_type=~/^$formget/i)
Valid strings for $formget are '10-K', '10-KSB', '10-K405', '10-KSB405' and '10-Q'; Deal killers are any of the above followed by '/'. For example, 10-K/A, 10-Q/A ... are invalid strings. In sum, the goal is to keep any string which begins with 10-K or 10-Q, AND does not terminate with a slash (/). I am most grateful for any ideas. Thank you!! Rick

Replies are listed 'Best First'.
Re: Pattern matching exlusion
by BrowserUk (Patriarch) on Nov 29, 2014 at 02:45 UTC

    Like this?

    @tests = qw[ 10-K 10-KSB 10-K405 10-KSB405 10-Q 10-K/B 10-KSB/ABC 10- +K405/A 10-KSB405/A 10-Q/A];; printf "%s : %s\n", $_, $_ =~ m[10-[KQ][^/]*$] ? 'matches.' : 'does no +t match.' for @tests;; 10-K : matches. 10-KSB : matches. 10-K405 : matches. 10-KSB405 : matches. 10-Q : matches. 10-K/B : does not match. 10-KSB/ABC : does not match. 10-K405/A : does not match. 10-KSB405/A : does not match. 10-Q/A : does not match.

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Hi BrowserUk, Yup, you nailed it! Exactly what I'm looking for. I am most grateful for your willingness to help. Can't thank you enough! Happy Holidays! Rick
      Hi BrowserUk, Question: my current pattern matching line of code looks like this: if ($form_type=~/^$formget/i); How would I incorporate the code you suggested in your post? Presumably, I would add m[10-KQ^/*$] , but I am unsure how the syntax plays out. Thank you!!
        ... m[10-KQ^/*$] ...

        Huh?!? You really need to start using  <code> ... </code> tags. Please see Markup in the Monastery, Writeup Formatting Tips, How do I post a question effectively?.

        How would I incorporate the code [BrowserUk] suggested ...

        One way would be to copy the contents of the  m// given by BrowserUk into a  qr// regex object as you originally had it, and then interpolate the  qr// into a  m// match as in your OP (I'm using  \z in place of  $ to match absolute end-of-string):

        c:\@Work\Perl\monks>perl -wMstrict -le "my @tests = qw[ 10-K 10-KSB 10-K405 10-KSB405 10-Q 10-K/B 10-KA/ 10-KSB/ABC 10-K405/A 10-KSB405/A 10-Q/A ]; ;; my $formget = qr{ 10- [KQ] [^/]* \z }xms; ;; print qq{'$_': }, $_ =~ m[^$formget] ? 'matches' : 'does not match' f +or @tests; " '10-K': matches '10-KSB': matches '10-K405': matches '10-KSB405': matches '10-Q': matches '10-K/B': does not match '10-KA/': does not match '10-KSB/ABC': does not match '10-K405/A': does not match '10-KSB405/A': does not match '10-Q/A': does not match
        Please see perlre, perlrequick and perlretut.

        Also: What version of Perl are you working with? It may be useful to know this for future reference.

        Update: I forgot about case insensitivity. You can add this with an  /i modifier:
            my $formget = qr{ 10- [KQ] [^/]* \z }xmsi;
        or, better, IMHO, for stylistic reasons,  (?i)
            my $formget = qr{ (?i) 10- [KQ] [^/]* \z }xms;
        or, best of all,
            my $formget = qr{ 10- [KkQq] [^/]* \z }xms;
        because it avoids the  /i performance hit if you are processing very many strings or very long strings.

Re: Pattern matching exlusion
by Loops (Curate) on Nov 29, 2014 at 02:42 UTC

    You don't specify which characters are valid as part of the identifier, so the following assumes that any letter or number is okay. From your code snippet, it looked as if you wanted to allow either upper or lower case matching as well:

    my $formget = qr(10-[KQ][A-Z0-9]*(?!/))i; while (my $form_type = <DATA>) { if ($form_type =~ /^$formget/ ) { print "Valid: $form_type"; } else { print "Invalid: $form_type"; } } __DATA__ 10-K 10-KSB 10-K405 10-ksb405 10-Q 10-K/A 10-Q/A
    Output:
    Valid: 10-K Valid: 10-KSB Valid: 10-K405 Valid: 10-ksb405 Valid: 10-Q Invalid: 10-K/A Invalid: 10-Q/A

    Check out the section titled "Look-Around Assertions" in perlre about using the (?! ...) construct to specify what's called a negative lookahead. That is, something which must not follow the current pattern.

      my $formget = qr(10-[KQ][A-Z0-9]*(?!/))i;

      The problem with this regex when used in a  /^$formget/ match is that it allows strings like  10-KA/ to match: if a  / is found at the end of  10-KA/ the regex can backtrack to  10-K and look ahead to  A which is not a  / character! BrowserUk's approach below gets around this by using an end-of-string anchor assertion to make sure that only non-/ characters follow the [KQ]. If anchoring were not possible, another way would be to use an 'atomic'  (?>pattern) group which will not allow backtracking into it:

      c:\@Work\Perl\monks>perl -wMstrict -le "my @tests = qw[ 10-K 10-KSB 10-K405 10-KSB405 10-Q 10-K/B 10-KSB/ABC 10-K405/A 10-KSB405/A 10-Q/A 10-KA/ ]; ;; my $formget = qr{ (?i) 10- [KQ] [A-Z0-9]* (?! /) }xms; printf 'valid: '; /^$formget/ and printf qq{'$_' } for @tests; print ''; ;; my $formget2 = qr{ (?> (?i) 10- [KQ] [A-Z0-9]*) (?! /) }xms; printf 'valid2: '; /^$formget2/ and printf qq{'$_' } for @tests; " valid: '10-K' '10-KSB' '10-K405' '10-KSB405' '10-Q' '10-KSB/ABC' '10- +K405/A' '10-KSB405/A' '10-KA/' valid2: '10-K' '10-KSB' '10-K405' '10-KSB405' '10-Q'

        Thank you AnomalousMonk, that makes sense. I really appreciate your time and interest. Have a great weekend!
      Hi Loops, Thank you for your help. Tell me if I am wrong concerning the interpretation of your expression: qr(10-KQA-Z0-9*(?!/))i; 1. You are saying find a string (QR) 2. which begins with 10-K or 10-Q KQ 3. followed by any letter from A-Z (A-Z] 4. followed by any number 0-9 [0-9} 5. excluding any string which terminates with slash *(?!/))i Is that right? If so, how would the expression change if I were looking for upper case letters only? Your insight is invaluable, and I am grateful! Rick

        The [A-Z0-9] part of the expression matches any letter or any number. And it's followed by an asterisk "*", which means zero or more instances. So "1B3" and "K4R" both match at that point in the pattern.

        To use upper case only, remove the trailing "i" from the initial $formget assignment.

Re: Pattern matching exlusion
by poj (Abbot) on Nov 29, 2014 at 14:17 UTC
    Does $formget represent the string that you are matching the pattern against ?. If so, what does $form_type represent ?
      $formget represents the user assigned string to retrieve. $form_type represents the stored string. Make sense?
        Make sense? Not really, some examples of each might help.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1108687]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (5)
As of 2024-03-28 16:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found