Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

regular expression help

by Anonymous Monk
on Jul 26, 2005 at 19:14 UTC ( [id://478327]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello perl gurus, I am trying to have a regular expression to validate a string to that it is in the format yyyy/mm/dd, and also within the range of 1753/01/01 and 9999/31/31.

I have managed to do this with this simple regex:

((17((5[3-9])|([6-9]\d)))|((18|19)\d\d)|([2-9]\d\d\d))[-/.](0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])

However, obviously this doesn't take care of leap years, and february dates. How can I achieve this? I am completely stuck.

Thanks all

Replies are listed 'Best First'.
Re: regular expression help
by gellyfish (Monsignor) on Jul 26, 2005 at 19:25 UTC

    Personally, I would do a simple regular expression to split the date string into year, month and date and then use one of the the popular Date handling modules to do the validation - a regular expression to get the correct days for the months is going to be monstrous if not impossible.

    /J\

Re: regular expression help
by Tanktalus (Canon) on Jul 26, 2005 at 19:25 UTC

    I think the simple, straight-forward answer is: don't. Use a regular expression to pull apart the data, and do minor syntax checking, and use other date-related modules to do semantic checking.

      Here's a solution that uses the core module Time::Local
      use Time::Local qw( timegm ); /(\d{4})[-\/.](\d{2})[-\/.](\d{2})/ or die("Bad format\n"); my $time = eval { timegm(0, 0, 0, $3, $2, $1) }; die("Bad date\n") if $@;
      Switch timegm for timelocal if you prefer.
Re: regular expression help
by kwaping (Priest) on Jul 26, 2005 at 19:40 UTC
    I highly recommend exploring Date::Calc, that module is really great for this kind of thing.
Re: regular expression help
by Codon (Friar) on Jul 26, 2005 at 20:35 UTC

    You're trying to make a regex do the work of a subroutine. The way to do leap day checks requires division checks on the year, but you only need to worry about that if the month February and the day is 29.

    Date::Calc (mentioned above) has a check_date() function that can do exactly what you are looking for (provided you split the date into year, month and day first).

    Ivan Heffner
    Sr. Software Engineer, DAS Lead
    WhitePages.com, Inc.
Re: regular expression help
by ChrisR (Hermit) on Jul 26, 2005 at 20:38 UTC
    There are many ways to do it but here is a pretty simple and self explanatory one:
    use strict; use warnings; use Date::Calc qw(check_date); my $date = "9999/12/11"; my($year,$month,$day) = $date =~ /(\d+)\/(\d+)\/(\d+)/; print "$year/$month/$day is "; if(check_date($year,$month,$day) && $year >=1753 && $year <= 9999) { print "valid"; } else { print "not valid\n"; }
    I don't think you are going to be able to do a complete validation using just a regex.

    Chris

Re: regular expression help
by AReed (Pilgrim) on Jul 26, 2005 at 20:16 UTC
    Unless using regexes is a requirement, I wouldn't use them at all for this purpose. I'd use "split" to separate the date string into its component parts and then validate each component separately.

    That has the added benefit of being easier to understand when you take a look at this code again a month from now.

Re: regular expression help
by mikeraz (Friar) on Jul 26, 2005 at 21:46 UTC

    I'm going to echo don't. To start with, as other's have pointed out RE is the wrong tool for this job and modules such as Date::Calc are there to do it for you. But then threre's the issue of your "simple" regular expression that is broken. I'm surprised no one pointed it out...

    ((17((5[3-9])|([6-9]\d)))|((18|19)\d\d)|([2-9]\d\d\d))[-/.](0[1-9]|1[0 +12])[- /.](0[1-9]|[12][0-9]|3[01]) ^ unescaped / +terminates the RE if you use it in /((17...)/ but even if you encapsulate it in a variable there's still problems: #!/usr/bin/perl @sampdata = qw ( 894/7/14 1752/8/12 1753/12/24 1957/8/30 3998/4/22 9999/3/15 10000/1/1 ); $re = "((17((5[3-9])|([6-9]\d)))|((18|19)\d\d)|([2-9]\d\d\d))[-/.](0[1 +-9]|1[012])[- /.](0[1-9]| [12][0-9]|3[01])"; while (<@sampdata>) { print; print ( /$re/ ? " is " : " is not " ); print " in range 1753 to 9999\n"; } __END__ 894/7/14 is not in range 1753 to 9999 1752/8/12 is not in range 1753 to 9999 1753/12/24 is in range 1753 to 9999 1957/8/30 is not in range 1753 to 9999 3998/4/22 is not in range 1753 to 9999 9999/3/15 is not in range 1753 to 9999 10000/1/1 is not in range 1753 to 9999 So a change of ((17((5[3-9])|([6-9]\d))) to ((17((5[3-9])|(1([6-9]\d)))) Seems to be in order. Re: the two instances of [-/.], was the second one supposed to include + a space?

    What seems simple today will be a headache to verify as correct in the future when you're trying to find a real bug.

    Be Appropriate && Follow Your Curiosity
Re: regular expression help
by puploki (Hermit) on Jul 26, 2005 at 19:41 UTC
    I was originally going to reply going "oh, there's this fabulous module that contains regexps for all sorts of common stuff called Regexp::Common", but it doesn't do dates - yet!

    There's another good resource called the regular expressions library - try this listing for some examples.

      There's another good resource called the regular expressions library - try this listing for some examples.
      You apparently haven't seen my rant about that place. Nor looked at the number of specific items in the past few months that I've commented on that are broke, misleading, or should be better done without regex.

      That place is a joke. Blind leading the blind.

      -- Randal L. Schwartz, Perl hacker
      Be sure to read my standard disclaimer if this is a reply.

        You're right of course - I hadn't come across your blog post about it (and being a bit of a newb, haven't seen your perlmonks posts).

        Mixing my metaphores perhaps, but there's more than one way to do it, but some ways are more equal than others :)

Re: regular expression help
by Adam (Vicar) on Jul 27, 2005 at 14:01 UTC
    You should listen to the other monks who directed you to modules. But I wanted to take a stab at doing it in a regex. This code seems to work:
    #!perl -w use strict; for my $y ( 1753 .. 9999 ) { for my $m ( 1 .. 12 ) { for my $d ( 1 .. 31 ) { my $date = sprintf "%04d/%02d/%02d", $y, $m, $d; if ( $date !~ m/^ ######################## # Year ([2-9]\d{3}|1[89]\d\d|17[6-9]\d|175[3-9]) \/ ##################### # Month (0[1-9]|1[0-2]) \/ ##################### # Day ( 0[1-9]|1\d|2[0-8]| # 01 - 28 (?<=(?:0[13578]|10|12)\/)(?:29|3[01])| # to 31 (?<=(?:0[469]|11)\/)(?:29|30)| # to 30 (?<=(?: (?:2[048]|3[26]|4[048]|5[26]|6[048]|7[26]|8[048]|9[26])00| \d\d(?:0[48]|1[26]|2[048]|3[26]|4[048]|5[26]|6[048]|7[26]|8[048]|9 +[26])\/02\/) )(?:29) # Leap year ) ######################## $/x ) { print "$date is invalid\n"; } # Else $1 == year, $2 == month, $3 == day }}}
    Of course, different countries switched to the Gregorian calendar at different dates, so you really need a module to get it right. My favorite tome on the topic is "Calendrical Calculations" by Edward M Reingold and Nachum Dershowitz.

    Update: I realized that I made a mistake listing leap-years. I've now fixed that, but it further demonstrates why a module is better.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://478327]
Approved by gellyfish
Front-paged by friedo
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (6)
As of 2024-04-23 17:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found