Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re: number of unique characters in a string

by Northpass Kid (Initiate)
on Oct 21, 2011 at 19:36 UTC ( [id://932972]=note: print w/replies, xml ) Need Help??


in reply to Re: number of unique characters in a string
in thread number of unique characters in a string

This thread is a bit old, but here is my take on the problem using only a regex. I needed to check a string that was entered as a new password for several format restrictions including having at least five different characters. I was adding this to existing code that expected a compiled regex to test the string so I didn't have the option of using additional commands. The matching regex just had to succeed or fail.

use Data::Dumper; my $unqchar5_regex = qr/^(?{%_=()})(?:(.)(?{$_{$1}++}))+ (??{(scalar(keys %_)<5)?~$1:''})$/x; my $pswd = "abcdef"; print (($pswd =~ $unqchar5_regex)?"Pass\n":"Fail\n"); print Dumper \%_; my $pswd = "abcdbd"; print (($pswd =~ $unqchar5_regex)?"Pass\n":"Fail\n"); print Dumper \%_;

The value of %_ is just kind of a bonus because rather than just setting the char as a hash element I increment it so you end up with a character count. I use %_ because it is defined globally by default. If you are afraid of collisions with its use then you can give the hash a different name but you'll have to define that variable somewhere in the code if 'use strict vars' is on.

Here's what is going on:

qr/^                             # Beginning of string.
    (?{%_=()})                   # Clear the counting hash (zero width op).
    (?:                          # Group the matching of the character with the setting 
                                 #   of the count hash, but don't collect the value.
       (.)                       # Match just one char and collect it.
       (?{$_{$1}++})             # Use the char as the key in the hash and count it (zero width op).
    )+                           # Do the collect and count for as many chars as we have.
    (??{                         # Eval this code and use its val as a pattern (zero width op).
        (scalar(keys %_)<5)      # Perform test for number of unique characters.
                           ?~$1  # If not what we want, fowl the pattern to make the match fail.
                           :''   # Otherwise don't change the pattern so match succeeds.
       })
   $/x;                          # Anchor end of line (x to break up the pattern).

Really any boolean test can be performed on the hash. Just have it evaluate to blank ('') if the regex should succeed, ~$1 if it should fail. The tilde (~) on ~$1 is the bitwise compliment operator, therefor whatever character (byte value) is in $1, ~$1 is guaranteed to NOT match. I needed to do this because my list of valid characters was ALL characters.

Here is a version with some debugging output:

my $unqchar5_regex = qr/^(?{%_=()})(?:(.)(?{$_{$1}++}))+ (?{warn Dumper \%_}) (??{warn scalar(keys %_)." $1\n"; (scalar(keys %_)<5)?~$1:'' })$/x;

If you run this you will see that the single char collection steps through and matches the entire string. Then if the boolean test doesn't add to the pattern, it's done, all matched, success. Otherwise if the test expression adds the value of ~$1 to the pattern, we are out of characters so the match is failing, but the regex engine backs up the string to be sure. Since it can't make a match (because $1 != ~$1) it fails.

In the end, we ended up not using this code at all because the password cracking tester did it. :P Hopefully this will be of use to someone.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://932972]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (2)
As of 2024-04-26 01:05 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found