Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re: How do I keep anything other than alphanumeric out of a variable?

by DrHyde (Prior)
on Aug 26, 2003 at 12:48 UTC ( [id://286682]=note: print w/replies, xml ) Need Help??


in reply to How do I keep anything other than alphanumeric out of a variable?

The following gets rid of non-alphanumerics and underscores:

$user_name =~ s/[\W_]//g;

The pattern [\W_] breaks down as follows:

  • [...] any of the characters from the character class consisting of ...
  • \W any "non-word" character ...
  • _ or an underscore

However, instead of just silently cleaning data, I'd prefer to check the string for undesirable characters and notify the user if it is bad, so that they can fix it:

$user_name =~ /[\W_]/ and warn "user name is bad\n"

Replies are listed 'Best First'.
Re: Answer: How do I keep anything other than Alpha/Numeric data out of a variable?
by davido (Cardinal) on Aug 26, 2003 at 19:01 UTC
    One caviet here: POSIX.

    POSIX can, on some systems, alter the definition of \W so tht its conventional meaning, "[^a-zA-Z0-9_]", is not exactly what you expect it to be.

    According to Friedl (the Owls book "Mastering Regular Expressions", 1st edition, pp. 65-66 and 257) (paraphrasing...):

    • POSIX can alter the meaning of \w and \W to include what other languages consider to be word characters.
    • "Locales can influence many tools that do not aspire to POSIX compliance, sometimes without their knowledge! ... If the non-POSIX utility is compiled on a system with a POSIX-compliant C library, some support can be bestowed, although the exact amount can be hit or miss. For example, the tool's author might have used the C library functions for capitalization issues, but not for \w support."
    • It is sometimes necessary to use [a-zA-Z0-9_] rather than /w. According to Friedl: "...a friend ran into a problem in which his version of Perl treated certain non ASCII bytes as [accented characters]..."

    Therefore, it is in some cases advisable to use the following construction to accomplish the task described in the subject line of this thread:

    $user_name =~ s/[^a-zA-Z0-9]//g;

    Or with case insensitivity:

    $user_name =~ s/[^a-z0-9]//gi;

    Of course this solution more accurately answers the question: "How do I purge anything other than Alpha/Numeric data from a variable?"

    Dave

    "If I had my life to do over again, I'd be a plumber." -- Albert Einstein

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://286682]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (5)
As of 2024-03-28 13:17 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found