Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change

Parsing oddball dates

by SavannahLion (Pilgrim)
on May 27, 2004 at 06:10 UTC ( [id://356814] : perlquestion . print w/replies, xml ) Need Help??

SavannahLion has asked for the wisdom of the Perl Monks concerning the following question:

This is one of those puzzles that I thought would be a breeze to tackle.

What I have is lots and lots of files that are filled with dates (along with other data). Getting the dates out of the files is surprisingly easy. They're always in the same relative places in all of the files.

Translating/converting these dates into something that's a bit more standard is the tough part. Initially, the Regex started out very small. Something like this:

my $date = "5/12/1998"; $date =~ m|(\d*)/(\d*)/(\d*)|;
Then I encountered some files where spaces were added in between the digits. (I assume to keep the single digit days/months lined up with double digit days/months.)
my $date = " 5/ 2/1998"; $date =~ m|\s?(\d*)\s?/\s?(\d*)\s?/\s?(\d*)|;
Of course, you can guess what else I encountered. Dates where the month is named instead of a numeric, such as Jan/1/1998. Short dates without the slashes such as Jan 1, 1998. Long dates such as January 1, 1998. Two digit year dates such as 1/1/88.

Pretty soon, my Regex started looking really ugly. It got to a point, where I'm spending more time adding new rules to the Regex rather than focusing on finishing the rest of the code to parse the other data.

The only major aberrant date format is when they're missing the actual day. Such as; February 1988. As far as I can tell, all of them follow the U.S. conventional order of Month, Day then Year.

So I come to the Monks. After stumbling over yet another rule change to the Regex, I realized that this can't be such a unique problem. Chances are some person or persons encountered the exact same issues and created a workable Regex/module that I can utilize to read and translate these dates into something more standardized. Can someone please help direct me to this Regex/Module, if it exists?

Thanks for your patience.
Prove your knowledge @ HLPD

Replies are listed 'Best First'.
Re: Parsing oddball dates
by davido (Cardinal) on May 27, 2004 at 06:22 UTC
    Look on CPAN for Date::Manip.

    use Date::Manip; my $string = '5/27/2004' # Just about any imaginable format. my $date = ParseDate( $string );

    With the above snippet, $date will contain a standardized date format. This module is great.


Re: Parsing oddball dates
by EdwardG (Vicar) on May 27, 2004 at 07:59 UTC

    I had a similar problem reading dates from a text file in order to report on projected developer headcount, and here's what worked for me

    use Date::Manip; # For ParseDate()'s ability to # handle non-ISO date formats # like "today" use Date::Calc qw (Decode_Date_EU); # For Decode_Date_EU()'s ability # to handle extra bits that # ParseDate() doesn't use Date::Simple qw (today); # For convenience, and for # compatibility with Date::Range sub DateSimple { # Try hard to turn a string into Date::Simple my $workdate = shift; # First try ParseDate my $date=UnixDate(ParseDate($workdate),"%Y-%m-%d"); # If that didn't work then give Decode_Date_EU a try unless ($date) { $workdate =~ s/(\d)[a-z]{2}/$1/ig; # Remove number suffixes if (my ($year,$month,$day) = Decode_Date_EU($workdate)) { $date = sprintf("%04i-%02i-%02i",$year,$month,$day); } } $date ? Date::Simple->new($date) : undef; }


Re: Parsing oddball dates
by bsb (Priest) on May 27, 2004 at 07:22 UTC
    Date::Calc function Decode_Date_EU:
    This function scans a given string and tries to parse any date which might be embedded in it.