$user_name =~ s/[\W_]//g;
The pattern [\W_] breaks down as follows:
- [...] any of the characters from the character class consisting of ...
- \W any "non-word" character ...
- _ or an underscore
However, instead of just silently cleaning data, I'd prefer to check
the string for undesirable characters and notify the user if it is bad,
so that they can fix it:
$user_name =~ /[\W_]/ and
warn "user name is bad\n"
| [reply] [d/l] [select] |
One caviet here: POSIX.
POSIX can, on some systems, alter the definition of \W so tht its conventional meaning, "[^a-zA-Z0-9_]", is not exactly what you expect it to be.
According to Friedl (the Owls book "Mastering Regular Expressions", 1st edition, pp. 65-66 and 257) (paraphrasing...):
- POSIX can alter the meaning of \w and \W to include what other languages consider to be word characters.
- "Locales can influence many tools that do not aspire to POSIX compliance, sometimes without their knowledge! ... If the non-POSIX utility is compiled on a system with a POSIX-compliant C library, some support can be bestowed, although the exact amount can be hit or miss. For example, the tool's author might have used the C library functions for capitalization issues, but not for \w support."
- It is sometimes necessary to use [a-zA-Z0-9_] rather than /w. According to Friedl: "...a friend ran into a problem in which his version of Perl treated certain non ASCII bytes as [accented characters]..."
Therefore, it is in some cases advisable to use the following construction to accomplish the task described in the subject line of this thread:
$user_name =~ s/[^a-zA-Z0-9]//g;
Or with case insensitivity:
$user_name =~ s/[^a-z0-9]//gi;
Of course this solution more accurately answers the question: "How do I purge anything other than Alpha/Numeric data from a variable?"
Dave
"If I had my life to do over again, I'd be a plumber." -- Albert Einstein | [reply] [d/l] [select] |
As long as you are willing to concede that an underscore '_' is alphanumeric, you can use this:
$user_name =~ s/\W//g;
\w is shorthand for the character class [A-Za-z0-9_]
and \W is the inverse of that set, i.e. [^\w].
| [reply] [d/l] [select] |
Your regex says, "find the alphanumeric characters in $user_name, and replace them with nothing."
You want the opposite:
$user_name =~ s/[^a-zA-Z0-9]//g;
The ^ at the beginning of the character class inverts the set, i.e. "all things not in this character class".
| [reply] [d/l] [select] |
The right tool for character classes is tr///, not s///:
$user_name =~ tr/0-9a-zA-Z//dc;
(You can add the underscore character, or any others you like, of course.)
If you wanted the username to look like a valid Perl identifier (i.e., begin with a letter, then alphanumerics + underscores), you would then want to strip off the leading non-letters:
$user_name =~ s/^[^a-z]*//i;
| [reply] [d/l] [select] |