isha has asked for the wisdom of the Perl Monks concerning the following question:
I want a regular expression that checks that a string has only 4 digits at the end & starts with "aaa_", inbetween string can have digits,alphabets and underscore{1,24} .
I tried with
$test = "00000";
if($test =~ /^aaa_[a-z0-9_]{1,24}[0-9]{4,4}$/){
print "Match---\n";
}
else{
print "No Match... :(\n";
}
But this says its a match.
Please provide a regexp that says:
Match for "aaa_gjh1_dfgdf_0009","aaa_gjh_0000"
No Match for "aaa_fdsfs_000", "aaa_sdf_jdsh_01111".
i.e. allows ONLY 4 digits at the end of a string
Re: Number of digits at the end of the string
by grinder (Bishop) on Jan 17, 2008 at 13:27 UTC
|
If you mean that a string must have four digits at the end, that is, the four digits are preceded by a non-digit, then you can use a positive look-behind assertion to look at the character before that group.
Since we are dealing with strings, and not lines, you should use string anchors (\A and \z). Working backwards, \d{4} gives us 4 digits. To see if the character before that is a non-digit? (?<=\D) (mnemonic: before equals a non-digit). This looks at the character, but does not consume it.
Then, alphanum characters and underscore can be represented by \w. As an added bonus (or possibly not, that's your call) it will work correctly on UTF-8 characters that represent alphanumish characters.
Putting it all together, we get
sub is_valid {
my $test = shift;
return $test =~ /\Aaaa_\w{1,24}(?<=\D)\d{4}\z/;
}
I naturally tend to wrap this logic up in a routine , so that in the mainline code I can write
for my $str (qw(aaa_gjh1_dfgdf_0009 aaa_gjh_0000 aaa_fdsfs_000 aaa_sdf
+_jdsh_01111)) {
print (is_valid($str) ? "ok $str\n" : "not ok $str\n");
}
• another intruder with the mooring in the heart of the Perl
| [reply] [d/l] [select] |
Re: Number of digits at the end of the string
by hipowls (Curate) on Jan 17, 2008 at 12:43 UTC
|
/\D\d{4}$/
if an acceptable string could be exactly four digits then
/
(?: # either
^ # start of string
| # or
\D # a non digit
)
\d{4} # exactly four digits
$ # end of string
/x
Update: /(?<!\d)\d{4}$/ covers all cases, I tend to forget about look behind because I have to use fairly ancient perls as well as the more modern varients and so code for the lowest denominator. | [reply] [d/l] [select] |
Re: Number of digits at the end of the string
by apl (Monsignor) on Jan 17, 2008 at 12:46 UTC
|
How do you define ONLY 4 digits at the end of a string ? "aaa_sdf_jdsh_01111" would seem to me to end in four digits.
If you're saying the string must always end in an underscore followed by four digits, try testing against /_\d{4}$/. | [reply] |
|
perhaps OP means the last 5 chars of the string must terminate with not more than 4 digits.
#!C:/perl/bin -w
=head
I want a regular expression that checks that a string has only 4 digit
+s at the end & starts with "aaa_", inbetween string can have digits,a
+lphabets and underscore{1,24} .
...
Please provide a regexp that says:
Match for "aaa_gjh1_dfgdf_0009","aaa_gjh_0000"
No Match for "aaa_fdsfs_000", "aaa_sdf_jdsh_01111".
i.e. allows ONLY 4 digits at the end of a string
=cut
my @testitems = qw( aaa_gjh1_dfgdf_0009 aaa_gjh_0000
aaa_gjh1_DFgdf_0009 aaa_gjh_0000
aaa_fdsfs_000
aaa_sdf_jdsh_01111
aaa_gjh1_dfgdf_0009
aaa_gjh1_df<:df_0009
aaa_gjh_0000 aaa_fdsfs_000 );
for my $item(@testitems) {
if ( $item =~ m/
^aaa_ # starts with "aaa_"
[_A-Za-z0-9]{1,23} # test next 1,23 chars: underline, alpha or dec
+imal_nums
(?<=\D) # Positive_Lookbehind, Last char BEFORE last 4
# (potentially, the 24th)
# can't be a digit else we may find 5 digit
+s at the end
\d{4}$ # has ONLY - emphasis mine - four digits at the
+ end
/x ) {
print "Matches: $item\n";
} else {
print "\tNo match: $item\n";
}
}
print "done\n";
=head1 OUTPUT:
Matches: aaa_gjh1_dfgdf_0009
Matches: aaa_gjh_0000
Matches: aaa_gjh1_DFgdf_0009
Matches: aaa_gjh_0000
No match: aaa_fdsfs_000
No match: aaa_sdf_jdsh_01111
Matches: aaa_gjh1_dfgdf_0009
No match: aaa_gjh1_df<:df_0009
Matches: aaa_gjh_0000
No match: aaa_fdsfs_000
done
=cut
| [reply] [d/l] |
|
Not only is the above badly formatted, it occured to me to test additional possibilities;
so (with apologies), revised, cleaned up, etc. :
#!C:/perl/bin -w
my @testitems = qw( aaa_gjh1_dfgdf_0009
aaa_gjh_0000
aaa_gjh1_DFgdf_0009
aaa_fdsfs_000
aaa_sdf_jdsh_01111
aaa_gjh1_dfgdf_0009
aaa_gjhi0000
aaa_gjh1_df<:df_0009
aaa_gjh_0000
aaa_ABCDEFGHI123456789ZYXW_1111
aaa_ABCDEFGHI123456789ZYXWVUT_1111);
# next to last: 23 before the last underlin
+e
# last element: more than 24 intermediate c
+hars
for my $item(@testitems) {
if ( $item =~ m/
^aaa_ # starts with "aaa_"
[_A-Za-z0-9]{1,23} # test next 1,23 chars: underline, alphas or d
+ecimal_nums
(?<=\D) # Pos_LookBehind, Last char BEFORE last 4 (may
+ be the 24th)
# can't be a digit else we may find 5 di
+gits at the end
\d{4}$ # has ONLY - emphasis mine - four digits at th
+e end
/x ) {
print "MATCHES: $item\n";
} else {
print "NO match: $item\n";
}
}
print "done\n";
=head1 OUTPUT:
MATCHES: aaa_gjh1_dfgdf_0009
MATCHES: aaa_gjh_0000
MATCHES: aaa_gjh1_DFgdf_0009 # caps are alpha; OP m
+ay want only lc
NO match: aaa_fdsfs_000 # only 3 digits at end
+ of string
# (contrary to sp
+ec)
NO match: aaa_sdf_jdsh_01111 # 5 digits at end of s
+tring
# (contrary to sp
+ec)
MATCHES: aaa_gjh1_dfgdf_0009
MATCHES: aaa_gjhi0000 # no underscore before
+ final 4 digits
# (ok per spec)
NO match: aaa_gjh1_df<:df_0009 # includes symbol and
+punct
# (contrary to sp
+ec)
MATCHES: aaa_gjh_0000
MATCHES: aaa_ABCDEFGHI123456789ZYXW_1111 # < 24 chars before fi
+nal 4 digits
# (ok per spec)
NO match: aaa_ABCDEFGHI123456789ZYXWVUT_1111 # > 24 chars before fi
+nal 4 digits
# (contrary to s
+pec)
done
=cut
But see grinder's elegant (and earlier) node, below! | [reply] [d/l] |
Re: Number of digits at the end of the string
by memnoch (Scribe) on Jan 17, 2008 at 13:13 UTC
|
If I understand your requirements correctly, the following should work. (It works with your example strings as you have provided.)
use strict;
my @strings = qw(aaa_gjh1_dfgdf_0009 aaa_gjh_0000 aaa_fdsfs_000 aaa_sd
+f_jdsh_01111);
foreach my $test (@strings) {
if($test =~ /^aaa_[a-z0-9_]{1,23}[a-z_][0-9]{4,4}$/){
print "Match---\n";
}
else{
print "No Match... :(\n";
}
}
| [reply] [d/l] |
Re: Number of digits at the end of the string
by toolic (Bishop) on Jan 17, 2008 at 14:26 UTC
|
I'm confused. When I run the code that you posted, it prints (contrary to your claim):
No Match... :(
Perhaps you updated your original post? | [reply] [d/l] |
|
grinder's solution works for me (grinder++). Here's my test against your strings..
#!/usr/bin/perl
use strict;
use warnings;
my @strings = qw/aaa_gjh1_dfgdf_0009 aaa_gjh_0000 00000 aaa_fdsfs_000
+aaa_sdf_jdsh_01111/;
print is_valid($_) ? '' : "no ", "match: $_\n" for @strings;
sub is_valid {
my $test = shift;
return $test =~ /\Aaaa_\w{1,24}(?<=\D)\d{4}\z/;
}
| [reply] [d/l] |
|
Thanks for the solution but this still says Match for aaa_fgjkghd11_hfksdh11__0000 ( two underscores)
| [reply] |
|
Re: Number of digits at the end of the string
by poolpi (Hermit) on Jan 18, 2008 at 11:35 UTC
|
#!/usr/bin/perl
use strict;
use warnings;
my @string = (
'aaa_fgjkghd11__hfksdh11__0000', 'aaa_fgjkghd11_hfksdh11__0000',
'aaa_fgjkghd11hfksdh11__0000', 'aaa_fgjkghd11__hfksdh11_0000',
'aaa_fgjkghd11__hfksdh110000', 'aaa_fgjk__ghd11_hfksdh1_10000',
'aaa_fgjk__ghd11_hfksdh1_0000', 'aa_fgjk__ghd11_hfksdh1_a000',
'aaa_fgjk_ghd_11_hfksdh1_0000'
);
for (@string) {
print /\A
a{3}
[_]
(?: [a-z0-9] | [^_][_] ){1,24}
[^_][_]
\d{4}
\z
/xms ? ' ' : 'no ';
print "match : $_\n";
}
Output:
no match : aaa_fgjkghd11__hfksdh11__0000
no match : aaa_fgjkghd11_hfksdh11__0000
no match : aaa_fgjkghd11hfksdh11__0000
no match : aaa_fgjkghd11__hfksdh11_0000
no match : aaa_fgjkghd11__hfksdh110000
no match : aaa_fgjk__ghd11_hfksdh1_10000
no match : aaa_fgjk__ghd11_hfksdh1_0000
no match : aa_fgjk__ghd11_hfksdh1_a000
match : aaa_fgjk_ghd_11_hfksdh1_0000
HTH,
PooLpi
| [reply] [d/l] [select] |
|
|