Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Date::Parse - how to correctly parse dates between 1901 and 1969

by eniad (Acolyte)
on Feb 19, 2018 at 19:29 UTC ( [id://1209504]=perlquestion: print w/replies, xml ) Need Help??

eniad has asked for the wisdom of the Perl Monks concerning the following question:

I am parsing dates and datetimes input by users who aren't too careful with their formatting. Date::Parse seems great because it handles most cases I need to handle.

Except datetimes between 1901-01-01 00:00:00 and 1968-12-31 23:59:59, as I found out today. For those datetimes, Date::Parse str2time adds an extra 100 years when it parses the datetime to epoch time.

Here is the code I am using to parse the datetimes:

#!/usr/bin/perl #--------------------------------------------------------------------- # format_date.pl # # format variable date inputs #--------------------------------------------------------------------- use strict; use warnings; use Date::Parse; use DateTime; my $DEFAULT_TIME_ZONE = "GMT"; my @dates = ( "1899-06-24 09:44:00", "1900-12-31 23:59:59", "1901-01-01 00:00:00", "1960-12-31 23:59:59", "1966-06-24 09:44:00", "1968-12-31 23:59:59", "1969-01-01 00:00:00", "1969-12-31 23:59:59", "1970-01-01 00:00:01", "2000-01-01 00:00:00", "2017-06-24 23:59:59", "2018-06-24 09:44:00", "2238-06-24 09:44:00" ); foreach my $string (@dates) { # format datetime field from any valid datetime input # default time zone is used if timezone is not included in string my $epoch = str2time( $string, $DEFAULT_TIME_ZONE ); # error if date is not correctly parsed if ( !$epoch ) { die("ERROR ====> invalid datetime ($string), " . "datetime format should be YYYY-MM-DD HH:MM:SS"); } my $date = DateTime->from_epoch( epoch => $epoch ); printf( "formatting datetime: value = %20s, epoch = %20u, " . "date = %20s\n", $string, $epoch, $date ); } exit 0;

Side note: I need to improve my error handling because the valid date 1970-01-01 00:00:00 will throw an error.

The additional 100 years for dates between 1901 and 1969 can be seen in the output:

formatting datetime: value = 1899-06-24 09:44:00, epoch = 18446744071 +484095456, date = 1899-06-24T09:44:00 formatting datetime: value = 1900-12-31 23:59:59, epoch = 18446744071 +532098815, date = 1900-12-31T23:59:59 formatting datetime: value = 1901-01-01 00:00:00, epoch = +978307200, date = 2001-01-01T00:00:00 formatting datetime: value = 1960-12-31 23:59:59, epoch = 2 +871763199, date = 2060-12-31T23:59:59 formatting datetime: value = 1966-06-24 09:44:00, epoch = 3 +044598240, date = 2066-06-24T09:44:00 formatting datetime: value = 1968-12-31 23:59:59, epoch = 3 +124223999, date = 2068-12-31T23:59:59 formatting datetime: value = 1969-01-01 00:00:00, epoch = 18446744073 +678015616, date = 1969-01-01T00:00:00 formatting datetime: value = 1969-12-31 23:59:59, epoch = 18446744073 +709551615, date = 1969-12-31T23:59:59 formatting datetime: value = 1970-01-01 00:00:01, epoch = + 1, date = 1970-01-01T00:00:01 formatting datetime: value = 2000-01-01 00:00:00, epoch = +946684800, date = 2000-01-01T00:00:00 formatting datetime: value = 2017-06-24 23:59:59, epoch = 1 +498348799, date = 2017-06-24T23:59:59 formatting datetime: value = 2018-06-24 09:44:00, epoch = 1 +529833440, date = 2018-06-24T09:44:00 formatting datetime: value = 2238-06-24 09:44:00, epoch = 8 +472332640, date = 2238-06-24T09:44:00

The Date::Parse documentation suggests it can handle dates at least as old at 1901-01-01. The Time::Local documentation suggest it should be able handle dates even older.

How should I handle this oddity? Is there a better way to parse variable input formats?

Replies are listed 'Best First'.
Re: Date::Parse - how to correctly parse dates between 1901 and 1969
by soonix (Canon) on Feb 19, 2018 at 20:33 UTC
    Not exactly. Time::Local says, that, depending on Perl version, it relies upon the system's time_t, which in turn holds a number of seconds since 1970-01-01 00:00. This would be negative for datetimes before 1970-01-01.

    You interpret the epoch value unsigned, which makes it "jump" between 1969-12-31 23:59 and 1970-01-01 00:00.

      How should I correctly interpret the value as signed?
        With a %d format, as per sprintf. BTW your data indicates another wraparound between 1900-12-31 and 1901-01-01.

        Therefore, I wouldn't use epoch seconds here, anyway. Perhaps one of the DateTime::Format modules can become your friend?

Re: Date::Parse - how to correctly parse dates between 1901 and 1969
by tangent (Parson) on Feb 19, 2018 at 22:55 UTC
    The 100 years oddity would seem to be related to issue #105031 for Date::Parse:
    After str2time uses strptime to break up the incoming date, it passes the result (with $year - 1900) to Time::Local::timelocal. timelocal uses a sliding window to determine if the year should be 19xx or 20xx, completely throwing away the *known* four-digit year that we sent to str2time.
    Time::Local interprets the (two digit) date like this:
    Years in the range 0..99 are interpreted as shorthand for years in the rolling "current century," defined as 50 years on either side of the current year. Thus, today, in 1999, 0 would refer to 2000, and 45 to 2045, but 55 would refer to 1955. Twenty years from now, 55 would instead refer to 2055. This is messy, but matches the way people currently think about two digit dates.
    Years 1968 and 1969 mark the crossover of that window - i.e. 1968 is on one side of 2018 (50 years), and 1969 is on the other (49 years)...
    my @dates = ( "1901-01-01 00:00:00", "1968-12-31 23:59:59", "1969-01-01 00:00:00", "1969-12-31 23:59:59", "1970-01-01 00:00:01", ); for my $string (@dates) { my $epoch = str2time( $string, 'GMT' ); print "$string ($epoch seconds)\n"; my $date = DateTime->from_epoch( epoch => $epoch ); print $date->ymd, " ", $date->hms, "\n\n"; }
    OUTPUT:
    1901-01-01 00:00:00 (978307200 seconds) 2001-01-01 00:00:00 1968-12-31 23:59:59 (3124223999 seconds) 2068-12-31 23:59:59 1969-01-01 00:00:00 (-31536000 seconds) 1969-01-01 00:00:00 1969-12-31 23:59:59 (-1 seconds) 1969-12-31 23:59:59 1970-01-01 00:00:01 (1 seconds) 1970-01-01 00:00:01
     
    To verify that the problem is with Date::Parse you can use Time::Local directly to show it returns the correct results if you give it the 4 digit year:
    use Time::Local; my %hash = ( "1901-01-01 00:00:00" => [00,00,00,01,00,1901], "1968-12-31 23:59:59" => [59,59,23,31,11,1968], "1969-01-01 00:00:00" => [00,00,00,01,00,1969], "1969-12-31 23:59:59" => [59,59,23,31,11,1969], "1970-01-01 00:00:01" => [01,00,00,01,00,1970], ); for my $string (@dates) { my $array = $hash{$string}; my $epoch = timegm( @$array ); print "$string ($epoch seconds)\n"; my $date = DateTime->from_epoch( epoch => $epoch ); print $date->ymd, " ", $date->hms, "\n\n"; }
    OUTPUT:
    1901-01-01 00:00:00 (-2177452800 seconds) 1901-01-01 00:00:00 1968-12-31 23:59:59 (-31536001 seconds) 1968-12-31 23:59:59 1969-01-01 00:00:00 (-31536000 seconds) 1969-01-01 00:00:00 1969-12-31 23:59:59 (-1 seconds) 1969-12-31 23:59:59 1970-01-01 00:00:01 (1 seconds) 1970-01-01 00:00:01
      This explains my difficulty. Thank you. Now that I understand where the `$year - 1900` is happening, I can figure out how to reliably parse then format the input date. I will post my updated code.
Re: Date::Parse - how to correctly parse dates between 1901 and 1969
by johngg (Canon) on Feb 19, 2018 at 23:27 UTC

    The core Time::Piece module seems to handle most of your dates but anything earlier than the start of the 20th century can't be parsed and causes an error.

    use strict; use warnings; use feature qw{ say }; use Time::Piece; my @dates = ( q{1960-12-31 23:59:59}, q{1966-06-24 09:44:00}, q{1968-12-31 23:59:59}, q{1969-01-01 00:00:00}, q{1969-12-31 23:59:59}, q{1970-01-01 00:00:01}, q{2000-01-01 00:00:00}, q{2017-06-24 23:59:59}, q{2018-06-24 09:44:00}, q{2238-06-24 09:44:00}, q{1900-12-31 23:59:59}, q{1901-01-01 00:00:00}, q{1900-01-01 00:00:00}, q{1899-12-31 23:59:59}, ); foreach my $date ( @dates ) { my $tp = Time::Piece->strptime( $date, q{%Y-%m-%d %T} ); say $date, q{ -> epoch }, $tp->epoch(); }

    The output.

    1960-12-31 23:59:59 -> epoch -283996801 1966-06-24 09:44:00 -> epoch -111161760 1968-12-31 23:59:59 -> epoch -31536001 1969-01-01 00:00:00 -> epoch -31536000 1969-12-31 23:59:59 -> epoch -1 1970-01-01 00:00:01 -> epoch 1 2000-01-01 00:00:00 -> epoch 946684800 2017-06-24 23:59:59 -> epoch 1498348799 2018-06-24 09:44:00 -> epoch 1529833440 2238-06-24 09:44:00 -> epoch 8472332640 1900-12-31 23:59:59 -> epoch -2177452801 1901-01-01 00:00:00 -> epoch -2177452800 1900-01-01 00:00:00 -> epoch -2208988800 Error parsing time at /usr/lib/x86_64-linux-gnu/perl/5.22/Time/Piece.p +m line 469.

    I hope this is of interest.

    Cheers,

    JohnGG

      I am using Date::Parse to parse datetimes that may be in different formats. The dates have come in like YYYYMMDD, YYYY-MM-DD, YYYYMMDD HH:MM, etc. Time::Piece->strptime() requires a defined input format. Time::Piece would certainly work once the datetime is well formatted. Is there and advantage to Time::Piece over DateTime?

        The only advantage is that it has been a core module since Perl 5.10 so it doesn't have to be installed from CPAN. I don't have much experience with Time::Piece and none at all with Date::Parse so can't really comment on usability.

        Cheers,

        JohnGG

Re: Date::Parse - how to correctly parse dates between 1901 and 1969
by eniad (Acolyte) on Feb 20, 2018 at 00:22 UTC

    Solution 1

    To handle the $year - 1900 bug mentioned by tangent, I have switched to the Date::Parse strptime function. That allows me to ensure a 4-digit year. It fails for dates before 1000. I address that if it comes up, which is unlikely, but possible.

    Here is my updated example script with better formatted output:

    #!/usr/bin/perl #--------------------------------------------------------------------- # format_date.pl # # format variable date inputs #--------------------------------------------------------------------- use strict; use warnings; use Date::Parse; use DateTime; my $DEFAULT_TIME_ZONE = "GMT"; my @dates = ( "0618-01-01 00:00:00", "1066-10-14 00:00:00", "1899-06-24 09:44:00", "1900-12-31 23:59:59", "1901-01-01 00:00:00", "1960-12-31 23:59:59", "1966-06-24 09:44:00", "1968-12-31 23:59:59", "1969-01-01 00:00:00", "1969-12-31 23:59:59", "1970-01-01 00:00:01", "2000-01-01 00:00:00", "2017-06-24 23:59:59", "2018-06-24 09:44:00", "2238-06-24 09:44:00" ); # define format for printf statements my $pstr = "%-19s %02s,%02s,%02s,%02s,%02s,%04s,%01s %-19s\n"; my $pfrm = "%19s %02u,%02u,%02u,%02u,%02u,%04u,%01u %19s\n"; printf( $pstr, "value", "ss", "mm", "hh", "dy", "mo", "year", "z", "da +te" ); foreach my $string (@dates) { # format datetime field from any valid datetime input # default time zone is used if timezone is not included in string my @datetime = strptime( $string, $DEFAULT_TIME_ZONE ); # error if date is not correctly parsed if ( scalar @datetime == 0 ) { die( "ERROR ====> invalid datetime ($string), " . "datetime format should be YYYY-MM-DD HH:MM:SS" +); } my ( $ss, $mm, $hh, $day, $month, $year, $zone ) = @datetime; if ( $year < 1000 ) { $year += 1900; } my %datetimehash = ( year => $year, month => $month + 1, day => $day, hour => $hh, minute => $mm, second => $ss, nanosecond => 0, time_zone => $zone, ); my $date = DateTime->new(%datetimehash); printf( $pfrm, $string, $ss, $mm, $hh, $day, $month, $year, $zone, + $date ); } exit 0;

    Here is the output:

    value ss,mm,hh,dy,mo,year,z date 0618-01-01 00:00:00 00,00,00,01,00,2518,0 2518-01-01T00:00:00 1066-10-14 00:00:00 00,00,00,14,09,1066,0 1066-10-14T00:00:00 1899-06-24 09:44:00 00,44,09,24,05,1899,0 1899-06-24T09:44:00 1900-12-31 23:59:59 59,59,23,31,11,1900,0 1900-12-31T23:59:59 1901-01-01 00:00:00 00,00,00,01,00,1901,0 1901-01-01T00:00:00 1960-12-31 23:59:59 59,59,23,31,11,1960,0 1960-12-31T23:59:59 1966-06-24 09:44:00 00,44,09,24,05,1966,0 1966-06-24T09:44:00 1968-12-31 23:59:59 59,59,23,31,11,1968,0 1968-12-31T23:59:59 1969-01-01 00:00:00 00,00,00,01,00,1969,0 1969-01-01T00:00:00 1969-12-31 23:59:59 59,59,23,31,11,1969,0 1969-12-31T23:59:59 1970-01-01 00:00:01 01,00,00,01,00,1970,0 1970-01-01T00:00:01 2000-01-01 00:00:00 00,00,00,01,00,2000,0 2000-01-01T00:00:00 2017-06-24 23:59:59 59,59,23,24,05,2017,0 2017-06-24T23:59:59 2018-06-24 09:44:00 00,44,09,24,05,2018,0 2018-06-24T09:44:00 2238-06-24 09:44:00 00,44,09,24,05,2238,0 2238-06-24T09:44:00

    Thank you all for the help. It took a bit to get my head wrapped around this one.

    EDIT: Solution 2

    Based on discussion with thanos1983, I explored using the Date::Manip module to parse datetimes.

    This simplified parsing variable inputs greatly. It even handles 2-digit years correctly.

    Here is the updated code and output:

    #!/usr/bin/perl use strict; use warnings; use Date::Manip; use feature 'say'; my $DEFAULT_TIME_ZONE = "GMT"; my @dates = ( "0618-01-01 00:00:00", # intpreted as 2518-01-01 "1066-10-14 00:00:00", "1899-06-24 09:44:00", "1900-12-31 23:59:59", "1901-01-01 00:00:00", "1960-12-31 23:59:59", "1968-12-31 23:59:59", "1969-01-01 00:00:00", "1969-12-31 23:59:59", "1970-01-01 00:00:01", "2000-01-01 00:00:00", "2018-02-20 00:00:00", "20180220", "02/20/2018", "02/20/18", # interpreted as 1918-02-20 "2018-02-20", "2238-02-20 09:44:00" ); # define format for printf statements say "Well formatted date Variable input date"; say UnixDate( ParseDate($_), '%Y-%m-%d %T' ) . qq{ $_} for (@dates); exit 0; __END__ $ format_date.pl Well formatted date Variable input date 0618-01-01 00:00:00 0618-01-01 00:00:00 1066-10-14 00:00:00 1066-10-14 00:00:00 1899-06-24 09:44:00 1899-06-24 09:44:00 1900-12-31 23:59:59 1900-12-31 23:59:59 1901-01-01 00:00:00 1901-01-01 00:00:00 1960-12-31 23:59:59 1960-12-31 23:59:59 1968-12-31 23:59:59 1968-12-31 23:59:59 1969-01-01 00:00:00 1969-01-01 00:00:00 1969-12-31 23:59:59 1969-12-31 23:59:59 1970-01-01 00:00:01 1970-01-01 00:00:01 2000-01-01 00:00:00 2000-01-01 00:00:00 2018-02-20 00:00:00 2018-02-20 00:00:00 2018-02-20 00:00:00 20180220 2018-02-20 00:00:00 02/20/2018 2018-02-20 00:00:00 02/20/18 2018-02-20 00:00:00 2018-02-20 2238-02-20 09:44:00 2238-02-20 09:44:00

        The input can be in variable formats:

        my @dates = ( "2018-02-20 00:00:00", "20180220", "02/20/2018", "02/20/18", # interpreted as 1918-02-20 "2018-02-20" );

        Date::Parse handles those (with some massaging). DateTime::Format::Strptime requires well formatted dates to begin with.

        Am I missing a module of DateTime or Time::Piece that can handle the different input formats?

Re: Date::Parse - how to correctly parse dates between 1901 and 1969
by thanos1983 (Parson) on Feb 20, 2018 at 11:37 UTC

    Hello eniad,

    It seems that fellow Monks have already addressed your problem, but I want to add something minor here also. Since there is a minor bug on the module would you consider also of using another module? For example I put together a very simple example on my favorite module Date::Manip regarding date manipulations.

    The date formats that this modules can accept are many, just briefly see Date::Manip::Examples. The modules can convert in one step the human readable format date to epoch and vice versa.

    I also included a minor comment in case you want to play with different time zone(s).

    Sample of code bellow:

    #!/usr/bin/perl use strict; use warnings; use Date::Manip; use feature 'say'; my @dates = ( "1899-06-24 09:44:00", "1900-12-31 23:59:59", "1901-01-01 00:00:00", "1960-12-31 23:59:59", "1966-06-24 09:44:00", "1968-12-31 23:59:59", "1969-01-01 00:00:00", "1969-12-31 23:59:59", "1970-01-01 00:00:01", "2000-01-01 00:00:00", "2017-06-24 23:59:59", "2018-06-24 09:44:00", "2238-06-24 09:44:00" ); foreach my $datestr (@dates) { my $epochSecs = UnixDate($datestr,'%s'); my $date = UnixDate( ParseDateString("epoch $epochSecs"), "%Y-%m-% +d %T"); say "Date value = ".$datestr.", epoch = ".$epochSecs.", date = " +.$date; } =timezone my $timezone = UnixDate( Date_ConvTZ( "today", 'CET', 'PST' ), "%Y-%m- +%d %T"); say $timezone; =cut __END__ $ perl test.pl Date value = 1899-06-24 09:44:00, epoch = -2225459760, date = 1899-06 +-24 09:44:00 Date value = 1900-12-31 23:59:59, epoch = -2177456401, date = 1900-12 +-31 23:59:59 Date value = 1901-01-01 00:00:00, epoch = -2177456400, date = 1901-01 +-01 00:00:00 Date value = 1960-12-31 23:59:59, epoch = -284000401, date = 1960-12- +31 23:59:59 Date value = 1966-06-24 09:44:00, epoch = -111165360, date = 1966-06- +24 09:44:00 Date value = 1968-12-31 23:59:59, epoch = -31539601, date = 1968-12-3 +1 23:59:59 Date value = 1969-01-01 00:00:00, epoch = -31539600, date = 1969-01-0 +1 00:00:00 Date value = 1969-12-31 23:59:59, epoch = -3601, date = 1969-12-31 23 +:59:59 Date value = 1970-01-01 00:00:01, epoch = -3599, date = 1970-01-01 00 +:00:01 Date value = 2000-01-01 00:00:00, epoch = 946681200, date = 2000-01-0 +1 00:00:00 Date value = 2017-06-24 23:59:59, epoch = 1498341599, date = 2017-06- +24 23:59:59 Date value = 2018-06-24 09:44:00, epoch = 1529826240, date = 2018-06- +24 09:44:00 Date value = 2238-06-24 09:44:00, epoch = 8472325440, date = 2238-06- +24 09:44:00

    Hope this helps, BR.

    Seeking for Perl wisdom...on the process of learning...not there...yet!

      Can Date::Manip parse variable input formats? I have gotten inputs like:

      my @dates = ( "2018-02-20 00:00:00", "20180220", "02/20/2018", "02/20/18", # interpreted as 1918-02-20 "2018-02-20" );

        Hello again eniad,

        Well a simple example would answer your question:

        #!/usr/bin/perl use strict; use warnings; use Date::Manip; use feature 'say'; my @dates = ( "2018-02-20 00:00:00", "20180220", "02/20/2018", "02/20/18", # interpreted as 1918-02-20 "2018-02-20", "today"); say UnixDate( ParseDate($_), "%Y-%m-%d") for (@dates); __END__ $ perl test.pl 2018-02-20 2018-02-20 2018-02-20 2018-02-20 2018-02-20 2018-02-20

        So in conclusion, yes the module can parse all the dates that you provided.

        Update: If you want to know which date formats are acceptable from the module read here Date::Manip::Date/VALID DATE FORMATS. On the same link you will find time formats but also date and time formats combined.

        Update 2: A minor similar example on how to parse time and print also time zone if you are interested:

        Hope this helps, BR.

        Seeking for Perl wisdom...on the process of learning...not there...yet!
Re: Date::Parse - how to correctly parse dates between 1901 and 1969
by haukex (Archbishop) on Feb 19, 2018 at 20:31 UTC

    Crossposted to StackOverflow. Crossposting is acceptable, but it is considered polite to inform about it so that efforts are not duplicated.

      Apologies! I considered linking, but got distracted. Thank you for linking to the crosspost.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1209504]
Approved by Discipulus
Front-paged by haukex
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (3)
As of 2024-04-24 21:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found