Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Sort text string by the date embedded

by and_noel (Initiate)
on Oct 17, 2011 at 16:01 UTC ( [id://931960]=perlquestion: print w/replies, xml ) Need Help??

and_noel has asked for the wisdom of the Perl Monks concerning the following question:

This node falls below the community's threshold of quality. You may see it by logging in.

Replies are listed 'Best First'.
Re: Sort text string by the date embedded
by zentara (Archbishop) on Oct 17, 2011 at 16:10 UTC
      Sorry.. first time submitting a question here. I won't let it happen again.

        I find it strange that you apologise before editing your post to insert the missing tags...

Re: Sort text string by the date embedded
by Marshall (Canon) on Oct 17, 2011 at 16:23 UTC
    Your post would be much more readable if you enclosed the code withing <code>...</code tags.

    The date format that you have is close to being able to be sorted by an alphanumeric comparison because you have 4 digit years and the months and days have leading zeroes (always are 2 digits).

    So in the sort, just reorder the string into the right order and use a single cmp instruction.

    #!/usr/bin/perl -w use strict; my @strings = ( "PROCESS_DT IN '01/01/2009'", "PROCESS_DT IN '05/23/2006'", "PROCESS_DT IN '01/01/2011'", "PROCESS_DT IN '04/19/2009'", "PROCESS_DT IN '07/01/2009'", ); @strings = sort { my ($monthA, $dayA, $yearA) = $a =~ m|(\d+)/(\d+)/( +\d+)|; my ($monthB, $dayB, $yearB) = $b =~ m|(\d+)/(\d+)/( +\d+)|; "$yearA$monthA$dayA" cmp "$yearB$monthB$dayB +" }@strings; print join("\n",@strings),"\n"; __END__ PROCESS_DT IN '05/23/2006' PROCESS_DT IN '01/01/2009' PROCESS_DT IN '04/19/2009' PROCESS_DT IN '07/01/2009' PROCESS_DT IN '01/01/2011'

      This worked perfectly and was simple. Thanks for the fast response

        Well, that depends upon a number of factors. The ST pre-calculates what I did with regex match and saves it for later use - this requires more memory copies and allocation - and then the transformation back into the original array.

        Over the years, I've done benchmarks with the ST and without. What I have found is that this is not as important as it used to be. Perl's sort algorithm has gone through improvements and in particular the worst case performance has increased dramatically due to merge sort vs quick sort. And I think that there have been other improvements "under the hood" that have made sort way faster than it once was.

        In a case like this, I would not at a first blush worry about performance with N=100 or even N=1,000. With N=10,000 I would think about it. So with an array of 80,000 things, an ST is clearly going to be worthwhile if performance matters. With 1,000 it is often about a toss up.

        I would say that the vast, vast majority of sorts that I do in Perl are on less than 100 things. I start thinking about performance considerations at about 1,000.

        For what it is worth those are my "rules of thumb". Now part of this does have to do with "how expensive" it is to extract the relevant comparison data from $a and $b. In the OP's question, I didn't have to call any fancy date/time modules - just very simple single regex got the job done. That matters.

        I don't have to save the result of a computation for use later if that computation wasn't "expensive" to begin with. And again, if I save that result, I have to make a new data structure, sort that, and then re-construct the original thing. For 100 things or less (most sorts), I don't see it. For sorting 1,000 things ST is worth thinking about. For sorting 10,000+ things you probably should be doing it.

Re: Sort text string by the date embedded
by sundialsvc4 (Abbot) on Oct 18, 2011 at 00:46 UTC

    In a slightly different scenario that might nevertheless prove interesting, I actually put a SQLite database file to very good use.   This is, of course, a no-cost, public domain(!) SQL database engine that stores everything in a single file and that runs on everything.   So you could, for example, whip up a file (using Perl, of course) in which you put the strings as-is in one column and the various “interesting” values that you have parsed out of them into other columns.   And once you have gone to that trouble, the payoff is that you can now just use SELECT..ORDER BY.   And you have just co-opted a whole lot of difficulty out of your application by pushing the whole job onto somebody else.   In fact, you have probably just made “your application” a whole lot smaller and with a whole lot less “messy work” to do.

    The only caveat ... and it happens to be a big one ... is that with SQLite you must use Transactions, because if you don’t, SQLite will re-read the data after every write!   (Ugh...)   But this is what it was expressly designed to do, and, if you but keep that little fact in mind, it performs splendidly.   (As in, “faster than a sonofabeech.”)   You’ll probably find yourself using them more and more often, because they are “just like ordinary flat files, but ever-so-much more-so.”

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://931960]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (1)
As of 2024-04-25 00:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found