|No such thing as a small change
Parsing oddball datesby SavannahLion (Pilgrim)
|on May 27, 2004 at 06:10 UTC
SavannahLion has asked for the wisdom of the Perl Monks concerning the following question:
This is one of those puzzles that I thought would be a breeze to tackle.
What I have is lots and lots of files that are filled with dates (along with other data). Getting the dates out of the files is surprisingly easy. They're always in the same relative places in all of the files.
Translating/converting these dates into something that's a bit more standard is the tough part. Initially, the Regex started out very small. Something like this:
Then I encountered some files where spaces were added in between the digits. (I assume to keep the single digit days/months lined up with double digit days/months.)
Of course, you can guess what else I encountered. Dates where the month is named instead of a numeric, such as Jan/1/1998. Short dates without the slashes such as Jan 1, 1998. Long dates such as January 1, 1998. Two digit year dates such as 1/1/88.
Pretty soon, my Regex started looking really ugly. It got to a point, where I'm spending more time adding new rules to the Regex rather than focusing on finishing the rest of the code to parse the other data.
The only major aberrant date format is when they're missing the actual day. Such as; February 1988. As far as I can tell, all of them follow the U.S. conventional order of Month, Day then Year.
So I come to the Monks. After stumbling over yet another rule change to the Regex, I realized that this can't be such a unique problem. Chances are some person or persons encountered the exact same issues and created a workable Regex/module that I can utilize to read and translate these dates into something more standardized. Can someone please help direct me to this Regex/Module, if it exists?