Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Tweak for my Perl Regex that screens for digits only

by hackermike (Novice)
on Jan 25, 2006 at 17:41 UTC ( [id://525497]=perlquestion: print w/replies, xml ) Need Help??

hackermike has asked for the wisdom of the Perl Monks concerning the following question:

I have this regex in my simple perl script handling a html form:
unless ($FORM{'phone'} =~ /\s*\(*\)*\.*\d+\-*\s*/) {
... the point of which is to disallow text in the field whilst allowing for various punctuation styles. Hoping to avoid re-inventing the whole system, I'd like to be able to allow certain text, i.e. ext. or Ext. Any ideas? thx mike

Replies are listed 'Best First'.
Re: Tweak for my Perl Regex that screens for digits only
by g0n (Priest) on Jan 25, 2006 at 18:46 UTC
    You can find some pretty thorough work on parsing phone numbers in Beast of the Number: Parsing the Feral Phone.

    Update: Off the top of my head, my (simplistic) approach would be something like:

    unless ($FORM{phone} =~/^(?:Ext|\d+|\)|\(|\.|\s|\+)+$/i){
    Which seems to tell the difference happily between:
    +44 (1234) 0123456 Ext 123
    and
    This is a test

    --------------------------------------------------------------

    "If there is such a phenomenon as absolute evil, it consists in treating another human being as a thing."

    John Brunner, "The Shockwave Rider".

      Yes,
      thing is that even entry level hackers will just add whatever text they want, AFTER the numbers ext. and more numbers.........
      I want to dis-allow ANY TEXT and allow any number of digits and the 2 exceptions of etc. and or Ext.
      Not just a choice of numbers OR text,
      but digits, only digits and NO TEXT anywhere, before or after the pemissible digits and ext. or Ext. This also question the use of the ^ and the $
      as the only permissible string would be the digits with etc. or Ext.
      I'd rather not just have the regex remove the offending text characters -
      an error msg plus having to edit the fiels makes the spammers have to spend more effort to get their stuffe' posted.
        That expression permits only:
        Numbers
        Ext or ext
        The specified punctuation
        anywhere in the string. It fails to match if anything else is included.

        Update: My test case is as follows:

        use strict; use warnings; my @strings; $strings[0] = "44 (1234) 123398 Ext 123"; $strings[1] = "+44 (1234) 123398 Ext 123"; $strings[2] = "44 (1234) 123398 Ext 123 xxxxxxxxxxxxxxx"; $strings[3] ="416-967-1111 ext. 123 xxxxxxxxxxxxxxxxx"; my $counter=0; for my $string (@strings) { if ($string =~/^(?:Ext|\d|\)|\(|\.|\s|\+|\-)+$/i) { print "$counter good\n"; } $counter++; }

        Which returns:

        0 good

        It fails to match with a leading +, which I haven't got to the bottom of yet, but rejects all the wrong strings. (I'd missed out a vital |, which stopped the + matching - thanks to ysth for spotting it).

        --------------------------------------------------------------

        "If there is such a phenomenon as absolute evil, it consists in treating another human being as a thing."

        John Brunner, "The Shockwave Rider".

        g0n's reply does do what you asked for (mmm, with the exception that it also allows the plus sign (+), but you may want that too).
      Yes,
      But I'm not actually parsing phone numbers as such, I'm just attempting to get whatever the submitter wants posted as a phone number whilst NOT allowing spammers to enter any text. I don't care what the numbers are, as long as it's only numbers - except wanting to permit only the characters ext. and/or Ext. - preferably between number sets.
Re: Tweak for my Perl Regex that screens for digits only
by ikegami (Patriarch) on Jan 25, 2006 at 17:52 UTC
    What you already have matches anything that has at least one digit, including input with letters. You need to have a leading ^ and a trailing $ to perform validation, and you should be using [xyz]* instead of x*y*z*.
      Hi! a nearly as I can tell, this reply does not address the point of the regex intention:
      It is * NOT * intended as validation for a phone number. It is intended to dis- allow ANY non digit charcters except spaces parens and hypens which are/can be used in phone # formatting.
      When text or letters are input the regex generates an error msg, so it does not appear to "match" input with letters. My question is only, how to allow ONLY certain word characters, specifically, ext. and/or Ext.?
      thx
      mike
      ---------------------
      I have this regex in my simple perl script handling a html form:
      <code> unless ($FORM{'phone'} =~ /\s*\(*\)*\.*\d+\-*\s*/) { <code> ... the point of which is to disallow text in the field whilst allowing for various punctuation styles. Hoping to avoid re-inventing the whole system, I'd like to be able to allow certain text, i.e. ext. or Ext. Any ideas?
      thx mike
        You want to list all the alternatives, separated by vertical bars. For individual character alternatives, you can create a character class (a list or range of characters inside square brackets). For example:
        /^(?:[-()\d\s]|[Ee]xt\.)*$/
        matches a series of (any combination of) only hyphen, left-paren, right-paren, digits, whitespace, or Ext. or ext.

        Updated: added the hyphen. Note that a hyphen, if it appears in a character class, must be the first listed character (so that it doesn't look like part of a character range).


        Caution: Contents may have been coded under pressure.
        When text or letters are input the regex generates an error msg,
        Disallowing is a form of validation. And your regexp doesn't disallow what you claim it disallows. If there's a digit anywhere in the string, it doesn't generate an error message.
        foreach ( '416-967-1111', 'I had 2 glasses of orange juice with my breakfast', 'I had two glasses of orange juice with my breakfast', ) { unless (/\s*\(*\)*\.*\d+\-*\s*/) { print("error message\n"); } else { print("no error message\n"); } }

        outputs

        no error message no error message error message

        rather than the desired

        no error message error message error message
Re: Tweak for my Perl Regex that screens for digits only
by mojotoad (Monsignor) on Feb 05, 2006 at 16:25 UTC
    If the whole point of this exercise is to limit input into a form, why not provide separate fields for phone numbers vs extensions and limit each field to numbers only? That way you dodge the whole issue of spammers entering text, but allow the entry of extensions.

    Cheers,
    Matt

      Thanks for your reply, I don't know why I was not notified, as I want to acknowledge contributions to query. Certainly you have a good point to use the form fields themselves to constrain input. One thing to consider is whether adding yet more fields to a form is OK, especially as a form gets ever more complex in the effort to get the data desired, and, whether the form designer would to prefer to NOT encourage more data to handle. Already form design has to consider separate fields for Last Name and First Name as well as 5 or 6 address fields, even when the data parts are not used separately as handling the zip 4/5 carrier route requires separate fields to get that right, or you got to allow for the hyphen, adn those that just HAVE TO enter two hyphens and then there are the spaces that seem to surround hyphens and copy n' paste which will pick up spaces and on and on. So I thought that, in this case, having one field that allowed for certain non digits would be an useful exercise to add another tool to the box. Thanks to all who have wrestled with this conumdrum, as it was less straighforward that I would have ever thought. Mike
Re: Tweak for my Perl Regex that screens for digits only
by ptum (Priest) on Jan 25, 2006 at 17:59 UTC

    And of course, there is always [id://Super Search]. I found pricing and phone number regexes, which might help you.


    No good deed goes unpunished. -- (attributed to) Oscar Wilde
Re: Tweak for my Perl Regex that screens for digits only
by Fletch (Bishop) on Jan 25, 2006 at 18:05 UTC
Re: Tweak for my Perl Regex that screens for digits only
by radiantmatrix (Parson) on Jan 27, 2006 at 18:37 UTC

    It seems like you are looking for a generalized way to allow certain groups of characters (primarily), with an addition to allow certain strings.

    First, think about which chars you'd like to allow. Since this is a phone number, these are spaces, digits, hyphens, parentheses, and periods (dots). The key here is that this is the class of chars you'd like to allow. So, to check if a form has only these chars

    # true if $string consists solely of allowed chars (and isn't empty) $string =~ m/^[\s\d-\(\)\.]+$/; # or, we could be true if $string contains anything except allowed cha +rs $string =~ m/[^\s\d-\(\)\.]/;

    The second approach uses the char-class negation -- the regex will be true if it finds at least one char not in the set. I like this approach less, so the remainder of examples will follow the other pattern (true if "good").

    Your next step is to deal with how people might specifiy extentions. The forms I have seen are "x1234", variations on "ext. 1234", and "(1234)". The last is already addressed in the regex we have, because it uses all legal chars. We address the first option ("x1234") by adding 'x' as a legal char in our class. We then deal with the ways we can allow the "ext. 1234" style by checking for the forms of "ext." (I choose to allow "ext" and "Ext" with an optional period.

    $string =~ m/^[\s\d-\(\)\.x]+|(?:[Ee]xt[\.]*)*$/;

    Now, that's fairly complex and somewhat hard to understand if you don't already grok regex. It's also easy to make a mistake maintaining it. Here's an alternate approach:

    sub is_valid { # check to see if an entry consists of valid "phone number" chars. my ($phone) = @_; $phone =~ s/[\-\.\s\(\)]+//g; #remove punct. and spaces # return 1 if begins with digits and is followed by # Ext or ext or x and digits. return ( $phone =~ m/^\d+(?:(?:[Ee]xt)|x)*\d$/ ); }

    By removing spaces and punctuation, we've made our task a little easier: we can validate that something has all legal chars, and do a minimal level of format checking as well (e.g. extentions must be at the end, if provided; everything begins and ends with digits {not counting puctuation}, etc.).

    This might be used like:

    unless ( is_valid($FORM{'phone'} ) { send_error("$FORM{'phone'} doesn't meet validity test"); exit; }

    If readability isn't a concern (but really, when isn't it?), you could combine the ideas into a complex regex:

    $phone =~ m{^\d+[\d\-\.\s\(\)]+(?:(?:[Ee]xt[\.]*)|x)*\d$};

    All said, you might find it easier to alter your entire approach, creating an HTML form that has two fields, 'phone' and 'ext', then checking that the 'phone' is all digits and punctuation and the ext is all digits (or empty). Like this:

    # valid number unless ($FORM{'phone'} =~ /^\d+[\s\d-\(\)\.]+$/) { die "Bad phone"; } #valid ext, if it exists unless ($FORM{'ext'} =~ /^\d+$/) { # don't die if zero-length! die "Bad ext" if length($FORM{'ext'}); }

    These regexen and code snippets are untested, and so might have some problems. Not nearly enough coffee today -- don't say I didn't warn you!

    <-radiant.matrix->
    A collection of thoughts and links from the minds of geeks
    The Code that can be seen is not the true Code
    I haven't found a problem yet that can't be solved by a well-placed trebuchet
Re: Tweak for my Perl Regex that screens for digits only
by stony (Initiate) on Jan 25, 2006 at 18:04 UTC
    $number =~ /^\s*(\(\d{3}[)\-]\s*)?\d{3}\-?\d{4}(\s*ext[\.\:]?\s*\d+)?\ +s*$/i
    I haven't tested it, but it should allow:
    • (123) 123-1234
    •  (123)123-3456
    • 1234567890
    • (123) 456-9876 ext 345
    • (456-6789 Ext.  5567894
    • and many more....
      • It doesn't match 416-967-1111. This format is used more than (416) 967-1111 here.
      • It doesn't match when the country code is supplied. (e.g. 1-416-967-1111)
      • It doesn't match x as the extention seperator. (e.g. ... x330)
      • It doesn't (necessarily) match any numbers from outside of North America.
      Thanks for your input
      The thing is that there is NO intention to force ANY number format or sequence, no intention to "validate" that it is a "real" phone number or match any particular country format, etc.

      All that is intended is that the user has to enter only digits and punctuation, any way they choose, any number of digits or sets of digits and punctuation. The only exception I'd like to allow is only the characters ext. and/or Ext.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://525497]
Approved by wfsp
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (2)
As of 2024-04-19 22:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found