Efficient way to verify scalar contents

x_stream3m has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Efficient way to verify scalar contents (updated x3) by haukex (Archbishop) on Jun 22, 2020 at 07:05 UTC
See also How to ask better questions using Test::More and sample data. You can do it all in a single regex, see perlretut (and perlre) in regards to alternations, character classes, and lookaheads. However, perhaps my test cases below will give a hint that these password rules aren't necessarily good indicators of password quality; perhaps use one of the other established methods like Data::Password::zxcvbn instead. If this happens to be for a website, note there's also a JavaScript version of the "zxcvbn" algorithm that allows you to give live feedback to the user when and why a password isn't good, so they can know while choosing one; of course it should still be verified on the server in case they have JS disabled. Update: Note that `given`/`when` are unforunately still experimental, in this case I'd suggest normal `if`s instead. Update 2: The talk on the library is great: https://youtu.be/vf37jh3dV2I (even just the first five minutes on the issues) Update 3: A slightly optimized version of the regex below, that turns out to be pretty fast: `my $pw_re = qr{ \A (?: .{20,} \| (?=.[a-z]) (?=.[A-Z]) (?: .{16,19} \| (?=.[0-9]) (?: .{12,15} \| (?=.[\x21-\x2F\x3A-\x40\x5B-\x60\x7B-\x7E]) .{8,11} ) ) ) \z }msx;` [download] Original code: use warnings; use strict; my $pw_re = qr{ \A (?: .{20,} \| (?=.[a-z]) (?=.[A-Z]) .{16,19} \| (?=.[a-z]) (?=.[A-Z]) (?=.[0-9]) .{12,15} \| (?=.[a-z]) (?=.[A-Z]) (?=.[0-9]) (?=.*[\x21-\x2F\x3A-\x40\x5B-\x60\x7B-\x7E]) .{8,11} ) \z }msx; use Test::More; unlike '', $pw_re; unlike 'aA0-', $pw_re; unlike 'aA0-aA0', $pw_re; like 'aA0-aA0-', $pw_re; like 'aaaaaA0-', $pw_re; like 'aaaaaaaaA0-', $pw_re; unlike 'aaaaaaaaa0-', $pw_re; unlike 'aaaaaaaaAA-', $pw_re; unlike 'AAAAAAAAA0-', $pw_re; unlike 'aaaaaaaaA00', $pw_re; like 'aaaaaaaaaaA0', $pw_re; like 'aaaaaaaaa-A0', $pw_re; like 'aaaaaaaaaaaaaA0', $pw_re; unlike 'aaaaaaaaaaaaaaA', $pw_re; unlike 'aaaaaaaaaaaaaa0', $pw_re; unlike 'AAAAAAAAAAAAAA0', $pw_re; unlike 'aaaaaaaaaaaaa-A', $pw_re; unlike 'aaaaaaaaaaaaa-0', $pw_re; like 'aaaaaaaaaaaaaaaA', $pw_re; like 'aaaaaaaaaaaaaa0A', $pw_re; like 'aaaaaaaaaaaaa-0A', $pw_re; like 'aaaaaaaaaaaaaaaaaaA', $pw_re; unlike 'aaaaaaaaaaaaaaaaaaa', $pw_re; unlike 'AAAAAAAAAAAAAAAAAAA', $pw_re; unlike 'aaaaaaaaaaaaaaaaaa0', $pw_re; unlike 'aaaaaaaaaaaaaaaaaa-', $pw_re; like 'aaaaaaaaaaaaaaaaaaaa', $pw_re; like 'aaaaaaaaaaaaaaaaaA0-', $pw_re; like 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa', $pw_re; done_testing; [download]	[reply] [d/l] [select]
Re^2: Efficient way to verify scalar contents (updated x3) by ikegami (Patriarch) on Jun 23, 2020 at 18:28 UTC
Note that given/when are unforunately still experimental That's a good thing. Their design is broken. The only reason they haven't been removed is that too many people are using them despite that problems with them X_X.	[reply]
Re^3: Efficient way to verify scalar contents (updated x3) by ikegami (Patriarch) on Jun 23, 2020 at 18:35 UTC
Correction, smart-matching is the one with the broken design. I think given/when's only problem is that they sometimes use smart-matching. But without smart-matching, there's not much point to given/when. For example, the OP would avoid given/when by making nothing but the following changes: given ⇒ for first when ⇒ if other when ⇒ elsif default ⇒ else	[reply]
Re^4: Efficient way to verify scalar contents (updated x3) by Anonymous Monk on Jun 23, 2020 at 19:29 UTC
Re: Efficient way to verify scalar contents by choroba (Cardinal) on Jun 22, 2020 at 09:17 UTC
`my @RE = (qr/[[:lower:]]/, qr/[[:upper:]]/, qr/[0-9]/, qr/[^[:alnum:]] +/); sub check { local ($_) = @_; my $length = length; return if $length < 8; return if $length < 12 && (! /$RE[0]/ \|\| ! /$RE[1]/ \|\| ! /$RE[2]/ +\|\| ! /$RE[3]/); return if $length < 16 && (! /$RE[0]/ \|\| ! /$RE[1]/ \|\| ! /$RE[2]/) +; return if $length < 20 && (! /$RE[0]/ \|\| ! /$RE[1]/); return 1 }` [download] Tested against the tests from Re: Efficient way to verify scalar contents (updated x3) (haukex++). Update: Or, if you want to abstract away the repeating `$RE[...]` `use List::Util qw{ any }; my @RE = (qr/[[:lower:]]/, qr/[[:upper:]]/, qr/[0-9]/, qr/[^[:alnum:]] +/); sub check { my ($password) = @_; my $length = length $password; return if $length < 8; return if $length < 12 && any { $password !~ $RE[$_] } 0 .. 3; return if $length < 16 && any { $password !~ $RE[$_] } 0 .. 2; return if $length < 20 && any { $password !~ $RE[$_] } 0 .. 1; return 1 }` [download] `map{substr$_->[0],$_->[1]\|\|0,1}[\\|\|{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^ARGV,3]`	[reply] [d/l] [select]
Re^2: Efficient way to verify scalar contents by LanX (Saint) on Jun 22, 2020 at 12:12 UTC
Please correct me, but do these character classes not depend on localisation? `[[:lower:]] [[:upper:]]` I think passwords should rather be defined in terms of ASCII Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery}	[reply] [d/l]
Re^3: Efficient way to verify scalar contents by choroba (Cardinal) on Jun 22, 2020 at 12:22 UTC
I'm not sure. I remember `[a-z]` depending on locale, but it was probably in bash rather than Perl... `$ (LC_ALL=en_US.UTF-8; [[ � =~ [a-z] ]] \|\| echo no) $ (LC_ALL=C; [[ � =~ [a-z] ]] \|\| echo no) no $ perl -Mutf8 -wE 'say "�" =~ /[[:lower:]]/' 1 $ perl -Mutf8 -wE 'say "�" =~ /[a-z]/' $` [download] Update: Switched from <c> to <pre> and back. `map{substr$_->[0],$_->[1]\|\|0,1}[\\|\|{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^ARGV,3]`	[reply] [d/l] [select]
Re: Efficient way to verify scalar contents by haukex (Archbishop) on Jun 22, 2020 at 13:29 UTC
I feel the current implementation is kind of redundant or inefficient? It turns out to be pretty good! Compared to the other solutions so far, it's only beaten by my second regex. `Rate LanX2 LanX1 choroba2 choroba1 haukex1 ori +g haukex2 LanX2 35040/s -- -49% -59% -75% -84% -87 +% -89% LanX1 68981/s 97% -- -19% -51% -69% -74 +% -79% choroba2 84861/s 142% 23% -- -39% -62% -68 +% -74% choroba1 139373/s 298% 102% 64% -- -37% -47 +% -57% haukex1 222355/s 535% 222% 162% 60% -- -15 +% -31% orig 261901/s 647% 280% 209% 88% 18% - +- -19% haukex2 323064/s 822% 368% 281% 132% 45% 23 +% --` [download] Updated benchmarks here! Read more... Benchmark code (10 kB)	[reply] [d/l] [select]
Re^2: Efficient way to verify scalar contents by LanX (Saint) on Jun 22, 2020 at 13:32 UTC
my goal was maintainability, if speed mattered, I'd move the declaration of the rules and rulesets out of the sub. redefining them each time is useless. Also using regex-refs via qr// is faster, but code-refs allow implementing more complex rules if requirements change. Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery}	[reply]
Re^3: Efficient way to verify scalar contents by haukex (Archbishop) on Jun 22, 2020 at 13:44 UTC
my goal was maintainability, if speed mattered, I'd move the declaration of the rules and rulesets out of the sub. redefining them each time is useless. Good point, modified: `Rate choroba2 LanX2 choroba1 LanX1 haukex1 ori +g haukex2 choroba2 84800/s -- -8% -37% -38% -59% -68 +% -73% LanX2 91935/s 8% -- -32% -33% -56% -66 +% -71% choroba1 134606/s 59% 46% -- -2% -35% -50 +% -58% LanX1 136955/s 62% 49% 2% -- -34% -49 +% -57% haukex1 208522/s 146% 127% 55% 52% -- -22 +% -35% orig 267007/s 215% 190% 98% 95% 28% - +- -16% haukex2 318577/s 276% 247% 137% 133% 53% 19 +% --` [download] Read more... Benchmark code (9 kB)	[reply] [d/l] [select]
Re: Efficient way to verify scalar contents by perlfan (Vicar) on Jun 22, 2020 at 05:45 UTC
I would consider using something that will force you to organize your checks like Validate::Tiny. Besides that, I don't think your rules are sufficient. There are a few password checking modules out there, and at the very least they will give you an idea about what to check. Length makes no difference if it's all the same character, for example. See Data::Password and Data::Password::Entropy, just to name a couple.	[reply]
Re: Efficient way to verify scalar contents by BillKSmith (Monsignor) on Jun 22, 2020 at 14:51 UTC
Your "Do it yourself" approach can be simplified by testing the content before the length. `use strict; use warnings; sub is_pw_valid { local $_ = $_[0]; return !(/[a-z]/ && /[A-Z]/) ? length >= 20 : !/\d/ ? length >= 16 : !/[\x21-\x2F\x3A-\x40\x5B-\x60\x7B-\x7E]/ ? length >= 12 : length >= 8 ; }` [download] UPDATE: Corrected error in regex for special characters and added (below) a less terse version of the same logic `sub is_pw_valid { my $password = $_[0]; my $MixedCase = qr/(:? [a-z].[A-Z]) \| (:? [A-Z].[a-z] )/x; my $Digit = qr/\d/; my $special = qr/[\x21-\x2F\x3A-\x40\x5B-\x60\x7B-\x7E]/; my $min_length; if ($password !~ $MixedCase) {$min_length = 20} elsif ($password !~ $Digit) {$min_length = 16} elsif ($password !~ $special) {$min_length = 12} else {$min_length = 8} + my $is_OK = ( length($password) >= $min_length ); return $is_OK; }` [download] Bill	[reply] [d/l] [select]
Re: Efficient way to verify scalar contents by LanX (Saint) on Jun 22, 2020 at 04:59 UTC
Hi, some untested pseudo code I see it as a rules matrix to be checked. I'd take `$index= min( int(length /4), 5 )` And an array of arrays @rules for lookup. The rules array would contain subs which all must return true for the password to pass. Like `$letter = sub { /[a-z]/i }` (This will test against $_ ) So `$rules[3] = [ $letter, $number ]` `$rules[1] = [ $forbidden ]` All you need to do now is too loop over all rules and fail if one entry returns false. HTH! :) PS instead of code refs one might also store regex refs with qr//. But why blocking the way to more complicated rules? Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery}	[reply] [d/l] [select]
Re^2: Efficient way to verify scalar contents by x_stream3m (Initiate) on Jun 22, 2020 at 05:15 UTC
Thanks for replying! A little confused on the purpose of `$index= min( int(length /4), 5 )` [download] length being the passwordLength?	[reply] [d/l]
Re^3: Efficient way to verify scalar contents by perlfan (Vicar) on Jun 22, 2020 at 05:48 UTC
It appears to be the index of the check based on length of the supplied password. You have 4 valid conditions that lead to additional checks; the 5th one is always invalid. Indexes 0, 1, 2, and 3 elements of the array of subroutine references he suggests will call the function appropriate for the length. `when ($_ >= 8 && $_ <= 11) # $index here is min( 2, 5), so 2 when ($_ >= 12 && $_ <= 15) { # $index here is min( 3, 5), so 3 when ($_ >= 16 && $_ <= 19) { # $index here is min( 4, 5), so 4 when ($_ >= 20) { # $index here is min( 5+,5), so alw +ays bounded at 5` [download] TBH it's not clear enough to make great sense to me, but he appears to be suggesting that you put your ranges into buckets based on the formula he gave; for a length of 20 or more, you're always going to fall into bucket 5 (or perhaps element 3) of your array of checks. This is actually a pretty good way to determine a bucket, so then you can have an array of checks; or to make it more clear, a hash: `my %check_dispatch = ( 2 => sub { ... }, # or better, define in named subs and, => \&check1 +, etc 3 => sub { ... }, 4 => sub { ... }, 5 => sub { ... }, ); my $pw = $_; my $index = min( (length $pw)/4, 5); if ($check_dispatch{$index} and 'CODE' eq ref $check_dispatch{$index}) + { die if not $check_dispatch{$index}->($pw); }` [download]	[reply] [d/l] [select]
Re^3: Efficient way to verify scalar contents by LanX (Saint) on Jun 22, 2020 at 11:40 UTC
As I said it was pseudocode =) I meant something like this: use strict; use warnings; use Data::Dump qw/pp dd/; use List::Util qw/min/; sub pw_not_ok { my $pw = shift; my $short = sub { "is too short" }; my $lowercase = sub { /[a-z]/ ? "" : "has no lowercase character" }; my $uppercase = sub { /[A-Z]/ ? "" : "has no uppercase character" }; ### FIXED my $number = sub { /[0-9]/ ? "" : "has no number" }; my $special = sub { if ( /[\x21-\x2F]/ \|\| /[\x3A-\x40]/ \|\| /[\x5B-\x60]/ \|\| /[\x7B-\x7E]/ ) { return ""; } else { return "has no special characters"; } }; my $pass = sub {""}; my @rules_for_class = ( [ $short ], [ $short ], [ $lowercase, $uppercase, $number ,$special ], [ $lowercase, $uppercase, $number ], [ $lowercase, $uppercase ], [ $pass ] ); my $len = length $pw; my $idx = min(int($len/4),5); for my $rule ( @{ $rules_for_class[$idx] } ) { $_ = $pw; if ( my $err = $rule->() ) { return $err; } } return; } # --------- Tests for my $pw ( "A" x 3 , "A" x 7 , "A" x 11, "a" x 5 . "A" x 3, "a" x 5 . "A" x 3 . "1", "a" x 5 . "A" x 3 . "1" .":", "a" x 24, ){ if ( my $err = pw_not_ok($pw) ){ warn "ERROR: $pw is $err\n"; } else { print "OK: $pw\n" } } [download] `ERROR: AAA is is too short ERROR: AAAAAAA is is too short ERROR: AAAAAAAAAAA is has no lowercase character ERROR: aaaaaAAA is has no number ERROR: aaaaaAAA1 is has no special characters OK: aaaaaAAA1: OK: aaaaaaaaaaaaaaaaaaaaaaaa` [download] Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery} UPDATE see Re^6: Efficient way to verify scalar contents	[reply] [d/l] [select]
Re^4: Efficient way to verify scalar contents by LanX (Saint) on Jun 22, 2020 at 12:04 UTC
Re^5: Efficient way to verify scalar contents by haukex (Archbishop) on Jun 22, 2020 at 13:17 UTC
Some notes below your chosen depth have not been shown here
Re: Efficient way to verify scalar contents by tybalt89 (Monsignor) on Jun 23, 2020 at 22:27 UTC
Just for some TMTOWTDI fun :) `sub validate { local $_ = shift; length >= 20 ? 1 : !/[a-z]/ \|\| !/[A-Z]/ ? 0 : length >= 16 ? 1 : !/\d/ ? 0 : length >= 12 ? 1 : !/[^_\w]/ ? 0 : length >= 8 ? 1 : 0; }` [download] Besides, I think it's cute.	[reply] [d/l]
Re^2: Efficient way to verify scalar contents by haukex (Archbishop) on Jun 24, 2020 at 09:46 UTC
Nice, so far the only other solution that beats the original :-) Rate BillKSmith2 choroba2 LanX2 choroba1 LanX1 haukex1 + BillKSmith1 orig tybalt89 haukex2 BillKSmith2 38227/s -- -55% -60% -72% -72% -83% + -85% -85% -87% -88% choroba2 85658/s 124% -- -10% -38% -38% -62% + -66% -67% -70% -74% LanX2 95639/s 150% 12% -- -31% -31% -57% + -62% -63% -67% -71% choroba1 138099/s 261% 61% 44% -- -1% -38% + -44% -47% -52% -58% LanX1 138866/s 263% 62% 45% 1% -- -38% + -44% -47% -52% -58% haukex1 223763/s 485% 161% 134% 62% 61% -- + -10% -14% -22% -33% BillKSmith1 248507/s 550% 190% 160% 80% 79% 11% + -- -5% -13% -25% orig 260650/s 582% 204% 173% 89% 88% 16% + 5% -- -9% -21% tybalt89 287084/s 651% 235% 200% 108% 107% 28% + 16% 10% -- -13% haukex2 331543/s 767% 287% 247% 140% 139% 48% + 33% 27% 15% -- [download] Read more... Benchmark code (12 kB)	[reply] [d/l] [select]
Re^3: Efficient way to verify scalar contents by tybalt89 (Monsignor) on Jun 24, 2020 at 15:13 UTC
Out of curiosity, I tried tr/// instead of regex, much faster :) `sub validatetr { local $_ = shift; length >= 20 ? 1 : !tr/a-z// \|\| !tr/A-Z// ? 0 : length >= 16 ? 1 : !tr/0-9// ? 0 : length >= 12 ? 1 : !tr/a-zA-Z0-9//c ? 0 : length >= 8 ? 1 : 0; }` [download]	[reply] [d/l]
Re^4: Efficient way to verify scalar contents by haukex (Archbishop) on Jun 25, 2020 at 06:36 UTC


more useful options
	PerlMonks

Efficient way to verify scalar contents

UPDATE