Re: Pattern matching exlusion
by BrowserUk (Patriarch) on Nov 29, 2014 at 02:45 UTC
|
@tests = qw[ 10-K 10-KSB 10-K405 10-KSB405 10-Q 10-K/B 10-KSB/ABC 10-
+K405/A 10-KSB405/A 10-Q/A];;
printf "%s : %s\n", $_, $_ =~ m[10-[KQ][^/]*$] ? 'matches.' : 'does no
+t match.' for @tests;;
10-K : matches.
10-KSB : matches.
10-K405 : matches.
10-KSB405 : matches.
10-Q : matches.
10-K/B : does not match.
10-KSB/ABC : does not match.
10-K405/A : does not match.
10-KSB405/A : does not match.
10-Q/A : does not match.
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [Watch: Dir/Any] [d/l] |
|
Hi BrowserUk,
Yup, you nailed it! Exactly what I'm looking for.
I am most grateful for your willingness to help.
Can't thank you enough!
Happy Holidays!
Rick
| [reply] [Watch: Dir/Any] |
|
Hi BrowserUk,
Question: my current pattern matching line of code looks like this:
if ($form_type=~/^$formget/i);
How would I incorporate the code you suggested in your post?
Presumably, I would add m[10-KQ^/*$] , but I am unsure how the syntax plays out.
Thank you!!
| [reply] [Watch: Dir/Any] |
|
... m[10-KQ^/*$] ...
Huh?!? You really need to start using <code> ... </code> tags. Please see Markup in the Monastery, Writeup Formatting Tips, How do I post a question effectively?.
How would I incorporate the code [BrowserUk] suggested ...
One way would be to copy the contents of the m// given by BrowserUk into a qr// regex object as you originally had it, and then interpolate the qr// into a m// match as in your OP (I'm using \z in place of $ to match absolute end-of-string):
c:\@Work\Perl\monks>perl -wMstrict -le
"my @tests = qw[
10-K 10-KSB 10-K405 10-KSB405 10-Q
10-K/B 10-KA/ 10-KSB/ABC 10-K405/A 10-KSB405/A 10-Q/A
];
;;
my $formget = qr{ 10- [KQ] [^/]* \z }xms;
;;
print qq{'$_': }, $_ =~ m[^$formget] ? 'matches' : 'does not match' f
+or @tests;
"
'10-K': matches
'10-KSB': matches
'10-K405': matches
'10-KSB405': matches
'10-Q': matches
'10-K/B': does not match
'10-KA/': does not match
'10-KSB/ABC': does not match
'10-K405/A': does not match
'10-KSB405/A': does not match
'10-Q/A': does not match
Please see perlre, perlrequick and perlretut.
Also: What version of Perl are you working with? It may be useful to know this for future reference.
Update: I forgot about case insensitivity. You can add this with an /i modifier:
my $formget = qr{ 10- [KQ] [^/]* \z }xmsi;
or, better, IMHO, for stylistic reasons, (?i)
my $formget = qr{ (?i) 10- [KQ] [^/]* \z }xms;
or, best of all,
my $formget = qr{ 10- [KkQq] [^/]* \z }xms;
because it avoids the /i performance hit if you are processing very many strings or very long strings.
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: Pattern matching exlusion
by Loops (Curate) on Nov 29, 2014 at 02:42 UTC
|
You don't specify which characters are valid as part of the identifier, so the following assumes that any letter or number is okay. From your code snippet, it looked as if you wanted to allow either upper or lower case matching as well:
my $formget = qr(10-[KQ][A-Z0-9]*(?!/))i;
while (my $form_type = <DATA>) {
if ($form_type =~ /^$formget/ ) {
print "Valid: $form_type";
} else {
print "Invalid: $form_type";
}
}
__DATA__
10-K
10-KSB
10-K405
10-ksb405
10-Q
10-K/A
10-Q/A
Output:
Valid: 10-K
Valid: 10-KSB
Valid: 10-K405
Valid: 10-ksb405
Valid: 10-Q
Invalid: 10-K/A
Invalid: 10-Q/A
Check out the section titled "Look-Around Assertions" in perlre about using the (?! ...) construct to specify what's called a negative lookahead. That is, something which must not follow the current pattern. | [reply] [Watch: Dir/Any] [d/l] [select] |
|
my $formget = qr(10-[KQ][A-Z0-9]*(?!/))i;
The problem with this regex when used in a /^$formget/ match is that it allows strings like 10-KA/ to match: if a / is found at the end of 10-KA/ the regex can backtrack to 10-K and look ahead to A which is not a / character! BrowserUk's approach below gets around this by using an end-of-string anchor assertion to make sure that only non-/ characters follow the [KQ]. If anchoring were not possible, another way would be to use an 'atomic' (?>pattern) group which will not allow backtracking into it:
c:\@Work\Perl\monks>perl -wMstrict -le
"my @tests = qw[
10-K 10-KSB 10-K405 10-KSB405 10-Q
10-K/B 10-KSB/ABC 10-K405/A 10-KSB405/A 10-Q/A 10-KA/
];
;;
my $formget = qr{ (?i) 10- [KQ] [A-Z0-9]* (?! /) }xms;
printf 'valid: ';
/^$formget/ and printf qq{'$_' } for @tests;
print '';
;;
my $formget2 = qr{ (?> (?i) 10- [KQ] [A-Z0-9]*) (?! /) }xms;
printf 'valid2: ';
/^$formget2/ and printf qq{'$_' } for @tests;
"
valid: '10-K' '10-KSB' '10-K405' '10-KSB405' '10-Q' '10-KSB/ABC' '10-
+K405/A' '10-KSB405/A' '10-KA/'
valid2: '10-K' '10-KSB' '10-K405' '10-KSB405' '10-Q'
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
Thank you AnomalousMonk, that makes sense. I really appreciate your time and interest. Have a great weekend!
| [reply] [Watch: Dir/Any] |
|
Hi Loops,
Thank you for your help.
Tell me if I am wrong concerning the interpretation of your expression: qr(10-KQA-Z0-9*(?!/))i;
1. You are saying find a string (QR)
2. which begins with 10-K or 10-Q KQ
3. followed by any letter from A-Z (A-Z]
4. followed by any number 0-9 [0-9}
5. excluding any string which terminates with slash *(?!/))i
Is that right?
If so, how would the expression change if I were looking for upper case letters only?
Your insight is invaluable, and I am grateful!
Rick
| [reply] [Watch: Dir/Any] |
|
The [A-Z0-9] part of the expression matches any letter or any number. And it's followed by an asterisk "*", which means zero or more instances. So "1B3" and "K4R" both match at that point in the pattern.
To use upper case only, remove the trailing "i" from the initial $formget assignment.
| [reply] [Watch: Dir/Any] [d/l] |
|
Re: Pattern matching exlusion
by poj (Abbot) on Nov 29, 2014 at 14:17 UTC
|
Does $formget represent the string that you are matching the pattern against ?. If so, what does $form_type represent ? | [reply] [Watch: Dir/Any] [d/l] [select] |
|
$formget represents the user assigned string to retrieve.
$form_type represents the stored string.
Make sense?
| [reply] [Watch: Dir/Any] |
|
Make sense?
Not really, some examples of each might help.
| [reply] [Watch: Dir/Any] |