eniad has asked for the wisdom of the Perl Monks concerning the following question:
I am parsing dates and datetimes input by users who aren't too careful with their formatting. Date::Parse seems great because it handles most cases I need to handle.
Except datetimes between 1901-01-01 00:00:00 and 1968-12-31 23:59:59, as I found out today. For those datetimes, Date::Parse str2time adds an extra 100 years when it parses the datetime to epoch time.
Here is the code I am using to parse the datetimes:
#!/usr/bin/perl
#---------------------------------------------------------------------
# format_date.pl
#
# format variable date inputs
#---------------------------------------------------------------------
use strict;
use warnings;
use Date::Parse;
use DateTime;
my $DEFAULT_TIME_ZONE = "GMT";
my @dates = (
"1899-06-24 09:44:00",
"1900-12-31 23:59:59",
"1901-01-01 00:00:00",
"1960-12-31 23:59:59",
"1966-06-24 09:44:00",
"1968-12-31 23:59:59",
"1969-01-01 00:00:00",
"1969-12-31 23:59:59",
"1970-01-01 00:00:01",
"2000-01-01 00:00:00",
"2017-06-24 23:59:59",
"2018-06-24 09:44:00",
"2238-06-24 09:44:00"
);
foreach my $string (@dates) {
# format datetime field from any valid datetime input
# default time zone is used if timezone is not included in string
my $epoch = str2time( $string, $DEFAULT_TIME_ZONE );
# error if date is not correctly parsed
if ( !$epoch ) {
die("ERROR ====> invalid datetime ($string), "
. "datetime format should be YYYY-MM-DD HH:MM:SS");
}
my $date = DateTime->from_epoch( epoch => $epoch );
printf( "formatting datetime: value = %20s, epoch = %20u, "
. "date = %20s\n", $string, $epoch, $date );
}
exit 0;
Side note: I need to improve my error handling because the valid date 1970-01-01 00:00:00 will throw an error.
The additional 100 years for dates between 1901 and 1969 can be seen in the output:
formatting datetime: value = 1899-06-24 09:44:00, epoch = 18446744071
+484095456, date = 1899-06-24T09:44:00
formatting datetime: value = 1900-12-31 23:59:59, epoch = 18446744071
+532098815, date = 1900-12-31T23:59:59
formatting datetime: value = 1901-01-01 00:00:00, epoch =
+978307200, date = 2001-01-01T00:00:00
formatting datetime: value = 1960-12-31 23:59:59, epoch = 2
+871763199, date = 2060-12-31T23:59:59
formatting datetime: value = 1966-06-24 09:44:00, epoch = 3
+044598240, date = 2066-06-24T09:44:00
formatting datetime: value = 1968-12-31 23:59:59, epoch = 3
+124223999, date = 2068-12-31T23:59:59
formatting datetime: value = 1969-01-01 00:00:00, epoch = 18446744073
+678015616, date = 1969-01-01T00:00:00
formatting datetime: value = 1969-12-31 23:59:59, epoch = 18446744073
+709551615, date = 1969-12-31T23:59:59
formatting datetime: value = 1970-01-01 00:00:01, epoch =
+ 1, date = 1970-01-01T00:00:01
formatting datetime: value = 2000-01-01 00:00:00, epoch =
+946684800, date = 2000-01-01T00:00:00
formatting datetime: value = 2017-06-24 23:59:59, epoch = 1
+498348799, date = 2017-06-24T23:59:59
formatting datetime: value = 2018-06-24 09:44:00, epoch = 1
+529833440, date = 2018-06-24T09:44:00
formatting datetime: value = 2238-06-24 09:44:00, epoch = 8
+472332640, date = 2238-06-24T09:44:00
The Date::Parse documentation suggests it can handle dates at least as old at 1901-01-01. The Time::Local documentation suggest it should be able handle dates even older.
How should I handle this oddity? Is there a better way to parse variable input formats?
Re: Date::Parse - how to correctly parse dates between 1901 and 1969
by soonix (Canon) on Feb 19, 2018 at 20:33 UTC
|
Not exactly. Time::Local says, that, depending on Perl version, it relies upon the system's time_t, which in turn holds a number of seconds since 1970-01-01 00:00. This would be negative for datetimes before 1970-01-01.
You interpret the epoch value unsigned, which makes it "jump" between 1969-12-31 23:59 and 1970-01-01 00:00. | [reply] [d/l] |
|
How should I correctly interpret the value as signed?
| [reply] |
|
| [reply] |
|
Re: Date::Parse - how to correctly parse dates between 1901 and 1969
by tangent (Parson) on Feb 19, 2018 at 22:55 UTC
|
The 100 years oddity would seem to be related to issue #105031 for Date::Parse:
After str2time uses strptime to break up the incoming date, it passes the result (with $year - 1900) to Time::Local::timelocal. timelocal uses a sliding window to determine if the year should be 19xx or 20xx, completely throwing away the *known* four-digit year that we sent to str2time.
Time::Local interprets the (two digit) date like this:
Years in the range 0..99 are interpreted as shorthand for years in the rolling "current century," defined as 50 years on either side of the current year. Thus, today, in 1999, 0 would refer to 2000, and 45 to 2045, but 55 would refer to 1955. Twenty years from now, 55 would instead refer to 2055. This is messy, but matches the way people currently think about two digit dates.
Years 1968 and 1969 mark the crossover of that window - i.e. 1968 is on one side of 2018 (50 years), and 1969 is on the other (49 years)...
my @dates = (
"1901-01-01 00:00:00",
"1968-12-31 23:59:59",
"1969-01-01 00:00:00",
"1969-12-31 23:59:59",
"1970-01-01 00:00:01",
);
for my $string (@dates) {
my $epoch = str2time( $string, 'GMT' );
print "$string ($epoch seconds)\n";
my $date = DateTime->from_epoch( epoch => $epoch );
print $date->ymd, " ", $date->hms, "\n\n";
}
OUTPUT:
1901-01-01 00:00:00 (978307200 seconds)
2001-01-01 00:00:00
1968-12-31 23:59:59 (3124223999 seconds)
2068-12-31 23:59:59
1969-01-01 00:00:00 (-31536000 seconds)
1969-01-01 00:00:00
1969-12-31 23:59:59 (-1 seconds)
1969-12-31 23:59:59
1970-01-01 00:00:01 (1 seconds)
1970-01-01 00:00:01
To verify that the problem is with Date::Parse you can use Time::Local directly to show it returns the correct results if you give it the 4 digit year:
use Time::Local;
my %hash = (
"1901-01-01 00:00:00" => [00,00,00,01,00,1901],
"1968-12-31 23:59:59" => [59,59,23,31,11,1968],
"1969-01-01 00:00:00" => [00,00,00,01,00,1969],
"1969-12-31 23:59:59" => [59,59,23,31,11,1969],
"1970-01-01 00:00:01" => [01,00,00,01,00,1970],
);
for my $string (@dates) {
my $array = $hash{$string};
my $epoch = timegm( @$array );
print "$string ($epoch seconds)\n";
my $date = DateTime->from_epoch( epoch => $epoch );
print $date->ymd, " ", $date->hms, "\n\n";
}
OUTPUT:
1901-01-01 00:00:00 (-2177452800 seconds)
1901-01-01 00:00:00
1968-12-31 23:59:59 (-31536001 seconds)
1968-12-31 23:59:59
1969-01-01 00:00:00 (-31536000 seconds)
1969-01-01 00:00:00
1969-12-31 23:59:59 (-1 seconds)
1969-12-31 23:59:59
1970-01-01 00:00:01 (1 seconds)
1970-01-01 00:00:01
| [reply] [d/l] [select] |
|
This explains my difficulty. Thank you. Now that I understand where the `$year - 1900` is happening, I can figure out how to reliably parse then format the input date.
I will post my updated code.
| [reply] |
Re: Date::Parse - how to correctly parse dates between 1901 and 1969
by johngg (Canon) on Feb 19, 2018 at 23:27 UTC
|
The core Time::Piece module seems to handle most of your dates but anything earlier than the start of the 20th century can't be parsed and causes an error.
use strict;
use warnings;
use feature qw{ say };
use Time::Piece;
my @dates = (
q{1960-12-31 23:59:59},
q{1966-06-24 09:44:00},
q{1968-12-31 23:59:59},
q{1969-01-01 00:00:00},
q{1969-12-31 23:59:59},
q{1970-01-01 00:00:01},
q{2000-01-01 00:00:00},
q{2017-06-24 23:59:59},
q{2018-06-24 09:44:00},
q{2238-06-24 09:44:00},
q{1900-12-31 23:59:59},
q{1901-01-01 00:00:00},
q{1900-01-01 00:00:00},
q{1899-12-31 23:59:59},
);
foreach my $date ( @dates )
{
my $tp = Time::Piece->strptime( $date, q{%Y-%m-%d %T} );
say $date, q{ -> epoch }, $tp->epoch();
}
The output.
1960-12-31 23:59:59 -> epoch -283996801
1966-06-24 09:44:00 -> epoch -111161760
1968-12-31 23:59:59 -> epoch -31536001
1969-01-01 00:00:00 -> epoch -31536000
1969-12-31 23:59:59 -> epoch -1
1970-01-01 00:00:01 -> epoch 1
2000-01-01 00:00:00 -> epoch 946684800
2017-06-24 23:59:59 -> epoch 1498348799
2018-06-24 09:44:00 -> epoch 1529833440
2238-06-24 09:44:00 -> epoch 8472332640
1900-12-31 23:59:59 -> epoch -2177452801
1901-01-01 00:00:00 -> epoch -2177452800
1900-01-01 00:00:00 -> epoch -2208988800
Error parsing time at /usr/lib/x86_64-linux-gnu/perl/5.22/Time/Piece.p
+m line 469.
I hope this is of interest.
| [reply] [d/l] [select] |
|
I am using Date::Parse to parse datetimes that may be in different formats. The dates have come in like YYYYMMDD, YYYY-MM-DD, YYYYMMDD HH:MM, etc. Time::Piece->strptime() requires a defined input format. Time::Piece would certainly work once the datetime is well formatted. Is there and advantage to Time::Piece over DateTime?
| [reply] |
|
| [reply] |
Re: Date::Parse - how to correctly parse dates between 1901 and 1969
by eniad (Acolyte) on Feb 20, 2018 at 00:22 UTC
|
Solution 1
To handle the $year - 1900 bug mentioned by tangent, I have switched to the Date::Parse strptime function. That allows me to ensure a 4-digit year. It fails for dates before 1000. I address that if it comes up, which is unlikely, but possible.
Here is my updated example script with better formatted output:
#!/usr/bin/perl
#---------------------------------------------------------------------
# format_date.pl
#
# format variable date inputs
#---------------------------------------------------------------------
use strict;
use warnings;
use Date::Parse;
use DateTime;
my $DEFAULT_TIME_ZONE = "GMT";
my @dates = (
"0618-01-01 00:00:00",
"1066-10-14 00:00:00",
"1899-06-24 09:44:00",
"1900-12-31 23:59:59",
"1901-01-01 00:00:00",
"1960-12-31 23:59:59",
"1966-06-24 09:44:00",
"1968-12-31 23:59:59",
"1969-01-01 00:00:00",
"1969-12-31 23:59:59",
"1970-01-01 00:00:01",
"2000-01-01 00:00:00",
"2017-06-24 23:59:59",
"2018-06-24 09:44:00",
"2238-06-24 09:44:00"
);
# define format for printf statements
my $pstr = "%-19s %02s,%02s,%02s,%02s,%02s,%04s,%01s %-19s\n";
my $pfrm = "%19s %02u,%02u,%02u,%02u,%02u,%04u,%01u %19s\n";
printf( $pstr, "value", "ss", "mm", "hh", "dy", "mo", "year", "z", "da
+te" );
foreach my $string (@dates) {
# format datetime field from any valid datetime input
# default time zone is used if timezone is not included in string
my @datetime = strptime( $string, $DEFAULT_TIME_ZONE );
# error if date is not correctly parsed
if ( scalar @datetime == 0 ) {
die( "ERROR ====> invalid datetime ($string), "
. "datetime format should be YYYY-MM-DD HH:MM:SS"
+);
}
my ( $ss, $mm, $hh, $day, $month, $year, $zone ) = @datetime;
if ( $year < 1000 ) { $year += 1900; }
my %datetimehash = (
year => $year,
month => $month + 1,
day => $day,
hour => $hh,
minute => $mm,
second => $ss,
nanosecond => 0,
time_zone => $zone,
);
my $date = DateTime->new(%datetimehash);
printf( $pfrm, $string, $ss, $mm, $hh, $day, $month, $year, $zone,
+ $date );
}
exit 0;
Here is the output:
value ss,mm,hh,dy,mo,year,z date
0618-01-01 00:00:00 00,00,00,01,00,2518,0 2518-01-01T00:00:00
1066-10-14 00:00:00 00,00,00,14,09,1066,0 1066-10-14T00:00:00
1899-06-24 09:44:00 00,44,09,24,05,1899,0 1899-06-24T09:44:00
1900-12-31 23:59:59 59,59,23,31,11,1900,0 1900-12-31T23:59:59
1901-01-01 00:00:00 00,00,00,01,00,1901,0 1901-01-01T00:00:00
1960-12-31 23:59:59 59,59,23,31,11,1960,0 1960-12-31T23:59:59
1966-06-24 09:44:00 00,44,09,24,05,1966,0 1966-06-24T09:44:00
1968-12-31 23:59:59 59,59,23,31,11,1968,0 1968-12-31T23:59:59
1969-01-01 00:00:00 00,00,00,01,00,1969,0 1969-01-01T00:00:00
1969-12-31 23:59:59 59,59,23,31,11,1969,0 1969-12-31T23:59:59
1970-01-01 00:00:01 01,00,00,01,00,1970,0 1970-01-01T00:00:01
2000-01-01 00:00:00 00,00,00,01,00,2000,0 2000-01-01T00:00:00
2017-06-24 23:59:59 59,59,23,24,05,2017,0 2017-06-24T23:59:59
2018-06-24 09:44:00 00,44,09,24,05,2018,0 2018-06-24T09:44:00
2238-06-24 09:44:00 00,44,09,24,05,2238,0 2238-06-24T09:44:00
Thank you all for the help. It took a bit to get my head wrapped around this one.
EDIT: Solution 2
Based on discussion with thanos1983, I explored using the Date::Manip module to parse datetimes.
This simplified parsing variable inputs greatly. It even handles 2-digit years correctly.
Here is the updated code and output:
#!/usr/bin/perl
use strict;
use warnings;
use Date::Manip;
use feature 'say';
my $DEFAULT_TIME_ZONE = "GMT";
my @dates = (
"0618-01-01 00:00:00", # intpreted as 2518-01-01
"1066-10-14 00:00:00",
"1899-06-24 09:44:00",
"1900-12-31 23:59:59",
"1901-01-01 00:00:00",
"1960-12-31 23:59:59",
"1968-12-31 23:59:59",
"1969-01-01 00:00:00",
"1969-12-31 23:59:59",
"1970-01-01 00:00:01",
"2000-01-01 00:00:00",
"2018-02-20 00:00:00",
"20180220",
"02/20/2018",
"02/20/18", # interpreted as 1918-02-20
"2018-02-20",
"2238-02-20 09:44:00"
);
# define format for printf statements
say "Well formatted date Variable input date";
say UnixDate( ParseDate($_), '%Y-%m-%d %T' ) . qq{ $_} for (@dates);
exit 0;
__END__
$ format_date.pl
Well formatted date Variable input date
0618-01-01 00:00:00 0618-01-01 00:00:00
1066-10-14 00:00:00 1066-10-14 00:00:00
1899-06-24 09:44:00 1899-06-24 09:44:00
1900-12-31 23:59:59 1900-12-31 23:59:59
1901-01-01 00:00:00 1901-01-01 00:00:00
1960-12-31 23:59:59 1960-12-31 23:59:59
1968-12-31 23:59:59 1968-12-31 23:59:59
1969-01-01 00:00:00 1969-01-01 00:00:00
1969-12-31 23:59:59 1969-12-31 23:59:59
1970-01-01 00:00:01 1970-01-01 00:00:01
2000-01-01 00:00:00 2000-01-01 00:00:00
2018-02-20 00:00:00 2018-02-20 00:00:00
2018-02-20 00:00:00 20180220
2018-02-20 00:00:00 02/20/2018
2018-02-20 00:00:00 02/20/18
2018-02-20 00:00:00 2018-02-20
2238-02-20 09:44:00 2238-02-20 09:44:00
| [reply] [d/l] [select] |
|
my $formatter = DateTime::Format::Strptime->new(...);
So DateTime::Format::Strptime, DateTime:: Strptime
my $dt = $formatter->parse_datetime( $timestampstring );
| [reply] [d/l] [select] |
|
my @dates = (
"2018-02-20 00:00:00",
"20180220",
"02/20/2018",
"02/20/18", # interpreted as 1918-02-20
"2018-02-20"
);
Date::Parse handles those (with some massaging). DateTime::Format::Strptime requires well formatted dates to begin with.
Am I missing a module of DateTime or Time::Piece that can handle the different input formats? | [reply] [d/l] |
|
|
Re: Date::Parse - how to correctly parse dates between 1901 and 1969
by thanos1983 (Parson) on Feb 20, 2018 at 11:37 UTC
|
Hello eniad,
It seems that fellow Monks have already addressed your problem, but I want to add something minor here also. Since there is a minor bug on the module would you consider also of using another module? For example I put together a very simple example on my favorite module Date::Manip
regarding date manipulations.
The date formats that this modules can accept are many, just briefly see Date::Manip::Examples. The modules can convert in one step the human readable format date to epoch and vice versa.
I also included a minor comment in case you want to play with different time zone(s).
Sample of code bellow:
#!/usr/bin/perl
use strict;
use warnings;
use Date::Manip;
use feature 'say';
my @dates = ( "1899-06-24 09:44:00",
"1900-12-31 23:59:59",
"1901-01-01 00:00:00",
"1960-12-31 23:59:59",
"1966-06-24 09:44:00",
"1968-12-31 23:59:59",
"1969-01-01 00:00:00",
"1969-12-31 23:59:59",
"1970-01-01 00:00:01",
"2000-01-01 00:00:00",
"2017-06-24 23:59:59",
"2018-06-24 09:44:00",
"2238-06-24 09:44:00" );
foreach my $datestr (@dates) {
my $epochSecs = UnixDate($datestr,'%s');
my $date = UnixDate( ParseDateString("epoch $epochSecs"), "%Y-%m-%
+d %T");
say "Date value = ".$datestr.", epoch = ".$epochSecs.", date = "
+.$date;
}
=timezone
my $timezone = UnixDate( Date_ConvTZ( "today", 'CET', 'PST' ), "%Y-%m-
+%d %T");
say $timezone;
=cut
__END__
$ perl test.pl
Date value = 1899-06-24 09:44:00, epoch = -2225459760, date = 1899-06
+-24 09:44:00
Date value = 1900-12-31 23:59:59, epoch = -2177456401, date = 1900-12
+-31 23:59:59
Date value = 1901-01-01 00:00:00, epoch = -2177456400, date = 1901-01
+-01 00:00:00
Date value = 1960-12-31 23:59:59, epoch = -284000401, date = 1960-12-
+31 23:59:59
Date value = 1966-06-24 09:44:00, epoch = -111165360, date = 1966-06-
+24 09:44:00
Date value = 1968-12-31 23:59:59, epoch = -31539601, date = 1968-12-3
+1 23:59:59
Date value = 1969-01-01 00:00:00, epoch = -31539600, date = 1969-01-0
+1 00:00:00
Date value = 1969-12-31 23:59:59, epoch = -3601, date = 1969-12-31 23
+:59:59
Date value = 1970-01-01 00:00:01, epoch = -3599, date = 1970-01-01 00
+:00:01
Date value = 2000-01-01 00:00:00, epoch = 946681200, date = 2000-01-0
+1 00:00:00
Date value = 2017-06-24 23:59:59, epoch = 1498341599, date = 2017-06-
+24 23:59:59
Date value = 2018-06-24 09:44:00, epoch = 1529826240, date = 2018-06-
+24 09:44:00
Date value = 2238-06-24 09:44:00, epoch = 8472325440, date = 2238-06-
+24 09:44:00
Hope this helps, BR.
Seeking for Perl wisdom...on the process of learning...not there...yet!
| [reply] [d/l] [select] |
|
my @dates = (
"2018-02-20 00:00:00",
"20180220",
"02/20/2018",
"02/20/18", # interpreted as 1918-02-20
"2018-02-20"
);
| [reply] [d/l] |
|
#!/usr/bin/perl
use strict;
use warnings;
use Date::Manip;
use feature 'say';
my @dates = ( "2018-02-20 00:00:00",
"20180220",
"02/20/2018",
"02/20/18", # interpreted as 1918-02-20
"2018-02-20",
"today");
say UnixDate( ParseDate($_), "%Y-%m-%d") for (@dates);
__END__
$ perl test.pl
2018-02-20
2018-02-20
2018-02-20
2018-02-20
2018-02-20
2018-02-20
So in conclusion, yes the module can parse all the dates that you provided.
Update: If you want to know which date formats are acceptable from the module read here Date::Manip::Date/VALID DATE FORMATS. On the same link you will find time formats but also date and time formats combined.
Update 2: A minor similar example on how to parse time and print also time zone if you are interested:
Hope this helps, BR.
Seeking for Perl wisdom...on the process of learning...not there...yet!
| [reply] [d/l] [select] |
|
|
Re: Date::Parse - how to correctly parse dates between 1901 and 1969
by haukex (Archbishop) on Feb 19, 2018 at 20:31 UTC
|
Crossposted to StackOverflow. Crossposting is acceptable, but it is considered polite to inform about it so that efforts are not duplicated.
| [reply] |
|
Apologies! I considered linking, but got distracted. Thank you for linking to the crosspost.
| [reply] |
|
|