Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re: How to get minimum start date in these start dates ?

by cog (Parson)
on Jul 10, 2006 at 07:38 UTC ( [id://560087]=note: print w/replies, xml ) Need Help??


in reply to How to get minimum start date in these start dates ?

I'm in a good mood, so I'll give you (untested) code:

my @sorted_dates = map { $_->{'date'} } sort { $a->{'year'} <=> $b->{'year} or $a->{'month'} <=> $b->{'month} or $a->{'day'} <=> $b->{'day'} } map { /(\d\d)-(\d\d)-(\d\d\d\d)/ ; { 'date' => $_, 'day' => $1, 'month' => $2, 'year' => $3 } } @start_date; my $minimum_date = $sorted_dates[-1];

Basically, I'm using a Schwartzian Transform in which I'm retrieving the elements of the date (year, month and day) and I'm sorting the dates by using those elements.

Search for "Schwartzian Transform" here in the Monastery if you don't know what it is. As for the rest, it's just a matter of sorting by year, by month if the years are the same, and by day if the months are the same.

Note that I'm assuming two digit days and two digit months.

Replies are listed 'Best First'.
Re^2: How to get minimum start date in these start dates ?
by JediWizard (Deacon) on Jul 10, 2006 at 14:17 UTC

    cog While I agree that a Schwartzian transfrom is a good idea here, I would like to make two small comments.

    1. Rather than doing three comparisons, first on year, then month, then day, I believe (though I haven't benchmarked it) that it would be faster to do a single comaprison of a string in the form yyyymmdd, which can be easily created with a regex.

    2. Depending on your data set, it maybe considerable faster to use an Orcish manouver. Especially if the same date may appear multiple time in the list (and I didn't nessicarily see anthing in the post to indicate that that wouldn't happen).

    my(%date_hash) = (); my(@start_date) = qw(01-06-2007 01-08-2006 01-06-2006 01-07-2007 06-01 +-2007); my @sorted_dates = sort({ ($date_has{$a} ||= &trans_date($a)) <=> ($date_has{$b} ||= &trans_date($b)) } @start_date); print join("\n", @sorted_dates); sub trans_date { my $date = shift; $date =~ s/(\d{2})-(\d{2})-(\d{4})/$3$2$1/; return $date; }

    They say that time changes things, but you actually have to change them yourself.

    —Andy Warhol

      While I don't disagree with you, I find the Schwartzian transform easier to understand and memorize than an Orcish Manouver, from the view point of a newbie.

      Also, the benchmarking would depend largely on the data set (suppose all the years are different, for instance).

      Still, I'm inclined to believe that speed won't be relevant, in this case :-) Just a hunch, you know? :-)

        I agree with you cog regarding the the ST over the OM but that's probably because I've never used the OM in anger so I'm not familiar with it. I think that both JediWizard's solution and yours overcomplicate the transformation of the date into a sortable form. Just reversing the date to sort it and then reversing it again to extract it seems much simpler and quicker to me. I have done some benchmarking which seems to bear this out. I've also corrected a couple of typos (you had missed a closing quote in one of your hash keys but I've unquoted them all and JediWizard had doubled his quote words like qw(qw( ... )). Here is the code

        use strict; use warnings; use Benchmark qw(cmpthese); # Generate a thousand dates at random. # my @startDates; push @startDates, sprintf(q{%02d}, int((rand 28) + 1)) . q{-} . sprintf(q{%02d}, int((rand 12) + 1)) . q{-} . int((rand 25) + 2000) for (1 .. 1000); # cog's method. # my $rcCog = sub { my @sortedDates = map { $_->{date} } sort { $a->{year} <=> $b->{year} or $a->{month} <=> $b->{month} or $a->{day} <=> $b->{day} } map { /(\d\d)-(\d\d)-(\d\d\d\d)/; { date => $_, day => $1, month => $2, year => $3 } } @startDates; return $sortedDates[0]; }; # JediWizard's method. # my $rcJediWizard = sub { my %dateHash = (); my @sortedDates = sort { ($dateHash{$a} ||= transDate($a)) <=> ($dateHash{$b} ||= transDate($b)) } @startDates; return $sortedDates[0]; }; # johngg's method. # my $rcJohnGG = sub { return ( map {join q{-}, reverse split /-/} sort map {join q{-}, reverse split /-/} @startDates )[0]; }; # Run all three on data to prove they come up with # the same answer. # print q{$rcCog->() - }, $rcCog->(), qq{\n}; print q{$rcJediWizard->() - }, $rcJediWizard->(), qq{\n}; print q{$rcJohnGG->() - }, $rcJohnGG->(), qq{\n}; # Run the benchmark # cmpthese (50, { Cog => $rcCog, JediWizard => $rcJediWizard, JohnGG => $rcJohnGG }); # JediWizard's date translation routine. # sub transDate { my $date = shift; $date =~ s/(\d{2})-(\d{2})-(\d{4})/$3$2$1/; return $date; }

        And these are the results

        $rcCog->() - 13-01-2000 $rcJediWizard->() - 13-01-2000 $rcJohnGG->() - 13-01-2000 Rate JediWizard Cog JohnGG JediWizard 6.00/s -- -0% -61% Cog 6.00/s 0% -- -61% JohnGG 15.2/s 153% 153% --

        Looks like your hunch about speed was correct in that you and JediWizard pan out about the same (seems to go either way over several runs but the one I captured here was a dead heat). However, my simpler solution appears to be consistently quicker.

        I hope this is of interest.

        Cheers,

        JohnGG

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://560087]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (3)
As of 2024-04-25 17:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found