http://qs321.pair.com?node_id=630124

nimdokk has asked for the wisdom of the Perl Monks concerning the following question:

I am building a tool to take some user input and then automatically set up directories and other tasks based on the user input. I'm trying to make sure that the input is something that can be used. I want to make sure the input matches all alphanumeric (no problem), the underscore character (no problem), and the dash "-". If I use the \W+ regex, it works fine except on the dash. I need to expand the regex somehow, but I am at a loss right now on how to do this. Below is the snippet I've been working on:

use strict; use warnings; foreach my $dir (<DATA>) { chomp $dir; print "DIR: $dir\n"; if ( $dir =~ /\W+/ ) { print "\t$dir is not clean.\n"; } else { print "\t$dir is clean.\n"; }#close if }#close foreach __DATA__ test 1234 test_1234 test-1234 test 1234 test?1234 test+1234 test.1234

The first four items should return as "Clean" while the last four should return as "Not Clean". I've tried reversing to match Word (\w+) but because there are valid characters in all the items, they all return as "Clean". Any suggestions would be appreciated. Thanks.

Replies are listed 'Best First'.
Re: Regex Problem
by FunkyMonk (Chancellor) on Aug 01, 2007 at 18:00 UTC
    You can use a negated character class: [^-\w] (match any character that isn't a \w or dash):
    if ( $dir =~ /[^-\w]/ ) { print "\t$dir is not clean.\n"; } else { print "\t$dir is clean.\n"; }#close if

    See perlretut, perlrequick and perlre for the details.

      That looks a little weird with the minus in the middle of the character class. Makes it look like a range. I'd sooner write that as

      if ( $dir =~ /[^\w-]/ ) {

      • another intruder with the mooring in the heart of the Perl

        I was (mistakenly) under the impression that a character class that ended with a dash was a syntax error, because perl treated it as though it were an unfinished range.

        After reading your post, I tried it in Perl, and of course it works fine. So, I thought it must be an AWKism, but no, it works there too.

        In short, I have no idea why I thought you couldn't end a character class with a dash. Lets just say it must be an age thing.

        Thanks for the enlightenment.

Re: Regex Problem
by ikegami (Patriarch) on Aug 01, 2007 at 18:55 UTC

    The opposite of "A non-word character" (/\W/) is "No non-word characters" (!/\W/) or "all word characters" (/^\w*\z/). That's why /\w/ didn't work.

Re: Regex Problem
by ww (Archbishop) on Aug 01, 2007 at 17:56 UTC
    regex comment: Character class.

    name comment: why would you want to accept a dir name with a space, but not one with a plus or a dot?

      The space should not be allowed either.