Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
Others have mentioned Date::Manip, but it seems it can't find the date text, it only works if the entire text is the date spec. That is, it will do "today at 1p" but not "meetings friends for lunch; today at 1pm" (unless you specify that the semicolon there is mandatory, but if that's the case you could also insist on a less general format).

I've faced this before, and decided I had to roll my own. Mine doesn't worry about times, though, is it would need modification. It's built to accept an email and scan it, sentence by sentence, looking for any dates, then add them to my calendar file. It also tries to limit it to dates in the future (so it'll add "tomorrow" but not "yesterday", and because calendar only runs for me once a night, not "today"). So it'll need some modification for you, but it might be a starting point, if no one can provide a better solution for finding the date buried inside text. Produces lines like

11/04/2004 Bruce Schneier :CRYPTO-GRAM, October 15, 2004:RSA Euro +pe in Barcelona, on 4 November
#!/usr/bin/perl -lw use diagnostics; use strict; use POSIX; use Time::Local; my ($from, $subject) = ("", ""); while(<>){ last if /^$/; chomp; if (/^From: (.*)/i){ $from = $1; $from =~ s/"?([^"]+)"?\s*<[^<]+>/$1/; $from =~ s/^([^,]+),\s*(.*)$/$2 $1/; } if (/^Subject: (.*)/i){ $subject = $1; $subject =~ s/\bRe://ig; $subject =~ s/\[.*?\]//ig; $subject =~ s/\s+/ /g; $subject =~ s/^ //; $subject =~ s/ $//; } } my %month; @month{qw( jan feb mar apr may jun jul aug sep oct nov dec )} = (1 .. +12); my $month = qr/ \b(?: a(?:pr(?:il)? |ug(?:ust)?) |(?:dec|nov|sept?)(?:ember)? |(?:febr?|jan)(?:uary)? |ju(?:ly?|ne?) |ma(?:y|r(?:ch)?) |oct(?:ober)? )\b/ix; my $text = ""; while (<>){ last if /^\s*-+Original Message-+$/; last if /^\s*Processing Initiated:/; last if /^Content-Type:\stext\/html/; next if /^[>:]/; next if /^On /; next if /\bwrote:\s*$/; chomp; $text .= $_ . " "; } for (split /(?<!\.\S)\.\s+|\s\s+/, $text){ if (/\b(?<![.\/\d:])(\d\d?)(?:(?:st|nd|rd|th)? of)? ($month)(?:,?\s+(? +:\d\d\d\d))?\b/oi) { test_date_text($2, $1, $3); } elsif (/($month)\s+(\d\d?)(?:st|nd|rd|th)?,?\s*(\d\d\d\d)?\b/oi){ test_date_text($1, $2, $3); } elsif (/\b(?<![.\/\d:])(\d\d\d\d)([-\/])(\d\d?)\2(\d\d?)(?![.\/])\b/ +){ test_date($3, $4, $1); } elsif (/\b(?<![.\/\d:])(\d\d?)([-\/])(\d\d?)(?:\2((?:\d\d){1,2}))?(? +![.\/])\b/){ test_date($1, $3, $4); } elsif (/\btomorrow\b/i) { print_report(time() + 86400); } else { test_date_weekday(); } } sub print_report { my ($time) = shift; # my $comment = /ed\b|\b(?:(?:w|sp?)ent|left|took|w(?:as|ere)|came) +\b/ ? "#" : ""; open CALENDAR, ">> $ENV{HOME}/calendar"; print CALENDAR strftime("%m/%d/%Y\t$from:$subject:$_", localtime($ +time)); close CALENDAR; } sub test_date_weekday { return if /$month/; return unless /\b(?<!')(Sunday|(?:Mon|Tues?|Wed(?:nes)?|Thu(?:rs)?|Fri|Satur +)(?:day)?)\b/i; my $weekday = lc substr($1, 0,3); my %weekdays; @weekdays{qw( sun mon tue wed thu fri sat )} = (-1 .. 5); print_report (time()+ 86400 * (($weekdays{$weekday} - (localtime())[6]) % 7 + 1)); } sub test_date { my ($m, $d, $y) = @_; if (defined $y){ return unless $y =~ /^(?:20)?0/; } else { $y = (localtime)[5]; } my $time = 0; eval { $time = timelocal(0,0,0, $d, $m-1, $y) }; if ($time > time){ print_report $time; return 1; } return; } sub test_date_text { my ($m, $d, $y) = @_; return test_date($month{lc substr($m, 0, 3)}, $d, $y); }

In reply to Re: Extracting time/date from english text by Eimi Metamorphoumai
in thread Extracting time/date from english text by fce2

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (5)
As of 2024-04-19 07:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found