Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Extracting time/date from english text

by fce2 (Sexton)
on Oct 22, 2004 at 05:27 UTC ( [id://401383]=perlquestion: print w/replies, xml ) Need Help??

fce2 has asked for the wisdom of the Perl Monks concerning the following question:

We recently got a nice group scheduling server here at work. I've already written a module that speaks its language, which is working great. Now I'm working on some little tools to show it off. I want to be able to do something like this:
% echo "meetings friends for lunch; today at 1pm" | schedule.pl
I need to be able to extract the date and time and turn it into an epoch. However, I want it to accept as many different ways of expressing a time as it can, eg:
  • 22 Oct, 15:30
  • noon tomorrow
  • this afternoon at 3
  • 10am next tuesday
And so on .. I figure that Date::Parse or similar is going to be involved somewhere, but what else should I be looking at for this sort of thing? Any pointers to this kind of text processing would be appreciated.

Replies are listed 'Best First'.
Re: Extracting time/date from english text
by Your Mother (Archbishop) on Oct 22, 2004 at 05:47 UTC

    Date::Manip is, as far as I know, the only one that does this. I think this has come up before so you should Super Search it too. Here's a snippet I use from the command line. Read the docs to see what's going on.

    use Date::Manip; my $str_date = shift || die "give a date!\n"; # almost *any* format $str_date = $str_date =~ /^(?!19|20)\d{7,10}$/ ? ParseDateString("epoch $str_date") : $str_date; print UnixDate( ParseDate( $str_date ), "\t%D %X, %A\n");

    It gets all your examples but one.

    jinx@jasper[38]~/bin>dater "22 Oct, 15:30" 10/22/04 15:30:00, Friday jinx@jasper[39]~/bin>dater "noon tomorrow" 10/22/04 12:00:00, Friday jinx@jasper[40]~/bin>dater "this afternoon at 3" jinx@jasper[41]~/bin>dater "this afternoon" jinx@jasper[42]~/bin>dater "this afternoon at 3" jinx@jasper[43]~/bin>dater "today at 3" jinx@jasper[44]~/bin>dater "today at 3pm" 10/21/04 15:00:00, Thursday jinx@jasper[45]~/bin>dater "10am next tuesday" 10/26/04 10:00:00, Tuesday
Re: Extracting time/date from english text
by BrowserUk (Patriarch) on Oct 22, 2004 at 05:51 UTC

    Date::Manip does most of those.

    #! perl -slw use strict; use Date::Manip; while( <DATA> ) { chomp; my $date = ParseDate( $_ ); printf "'$_' contains the date \n '%s'\n\n", UnixDate( $date ,"%T, %b %e, %Y.") || 'Not parsed'; } __DATA__ 22 Oct, 15:30 noon tomorrow this afternoon at 3 10am next tuesday

    Outputs

    P:\test>401303 '22 Oct, 15:30' contains the date '15:30:00, Oct 22, 2004.' 'noon tomorrow' contains the date '12:00:00, Oct 23, 2004.' 'this afternoon at 3' contains the date 'Not parsed' '10am next tuesday' contains the date '10:00:00, Oct 26, 2004.'

    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
    "Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon
Re: Extracting time/date from english text
by Eimi Metamorphoumai (Deacon) on Oct 22, 2004 at 12:54 UTC
    Others have mentioned Date::Manip, but it seems it can't find the date text, it only works if the entire text is the date spec. That is, it will do "today at 1p" but not "meetings friends for lunch; today at 1pm" (unless you specify that the semicolon there is mandatory, but if that's the case you could also insist on a less general format).

    I've faced this before, and decided I had to roll my own. Mine doesn't worry about times, though, is it would need modification. It's built to accept an email and scan it, sentence by sentence, looking for any dates, then add them to my calendar file. It also tries to limit it to dates in the future (so it'll add "tomorrow" but not "yesterday", and because calendar only runs for me once a night, not "today"). So it'll need some modification for you, but it might be a starting point, if no one can provide a better solution for finding the date buried inside text. Produces lines like

    11/04/2004 Bruce Schneier :CRYPTO-GRAM, October 15, 2004:RSA Euro +pe in Barcelona, on 4 November
    #!/usr/bin/perl -lw use diagnostics; use strict; use POSIX; use Time::Local; my ($from, $subject) = ("", ""); while(<>){ last if /^$/; chomp; if (/^From: (.*)/i){ $from = $1; $from =~ s/"?([^"]+)"?\s*<[^<]+>/$1/; $from =~ s/^([^,]+),\s*(.*)$/$2 $1/; } if (/^Subject: (.*)/i){ $subject = $1; $subject =~ s/\bRe://ig; $subject =~ s/\[.*?\]//ig; $subject =~ s/\s+/ /g; $subject =~ s/^ //; $subject =~ s/ $//; } } my %month; @month{qw( jan feb mar apr may jun jul aug sep oct nov dec )} = (1 .. +12); my $month = qr/ \b(?: a(?:pr(?:il)? |ug(?:ust)?) |(?:dec|nov|sept?)(?:ember)? |(?:febr?|jan)(?:uary)? |ju(?:ly?|ne?) |ma(?:y|r(?:ch)?) |oct(?:ober)? )\b/ix; my $text = ""; while (<>){ last if /^\s*-+Original Message-+$/; last if /^\s*Processing Initiated:/; last if /^Content-Type:\stext\/html/; next if /^[>:]/; next if /^On /; next if /\bwrote:\s*$/; chomp; $text .= $_ . " "; } for (split /(?<!\.\S)\.\s+|\s\s+/, $text){ if (/\b(?<![.\/\d:])(\d\d?)(?:(?:st|nd|rd|th)? of)? ($month)(?:,?\s+(? +:\d\d\d\d))?\b/oi) { test_date_text($2, $1, $3); } elsif (/($month)\s+(\d\d?)(?:st|nd|rd|th)?,?\s*(\d\d\d\d)?\b/oi){ test_date_text($1, $2, $3); } elsif (/\b(?<![.\/\d:])(\d\d\d\d)([-\/])(\d\d?)\2(\d\d?)(?![.\/])\b/ +){ test_date($3, $4, $1); } elsif (/\b(?<![.\/\d:])(\d\d?)([-\/])(\d\d?)(?:\2((?:\d\d){1,2}))?(? +![.\/])\b/){ test_date($1, $3, $4); } elsif (/\btomorrow\b/i) { print_report(time() + 86400); } else { test_date_weekday(); } } sub print_report { my ($time) = shift; # my $comment = /ed\b|\b(?:(?:w|sp?)ent|left|took|w(?:as|ere)|came) +\b/ ? "#" : ""; open CALENDAR, ">> $ENV{HOME}/calendar"; print CALENDAR strftime("%m/%d/%Y\t$from:$subject:$_", localtime($ +time)); close CALENDAR; } sub test_date_weekday { return if /$month/; return unless /\b(?<!')(Sunday|(?:Mon|Tues?|Wed(?:nes)?|Thu(?:rs)?|Fri|Satur +)(?:day)?)\b/i; my $weekday = lc substr($1, 0,3); my %weekdays; @weekdays{qw( sun mon tue wed thu fri sat )} = (-1 .. 5); print_report (time()+ 86400 * (($weekdays{$weekday} - (localtime())[6]) % 7 + 1)); } sub test_date { my ($m, $d, $y) = @_; if (defined $y){ return unless $y =~ /^(?:20)?0/; } else { $y = (localtime)[5]; } my $time = 0; eval { $time = timelocal(0,0,0, $d, $m-1, $y) }; if ($time > time){ print_report $time; return 1; } return; } sub test_date_text { my ($m, $d, $y) = @_; return test_date($month{lc substr($m, 0, 3)}, $d, $y); }

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://401383]
Approved by FoxtrotUniform
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (2)
As of 2024-04-25 19:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found