Sort text string by the date embedded

and_noel has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Sort text string by the date embedded by zentara (Archbishop) on Oct 17, 2011 at 16:10 UTC
what I am missing? Putting your code in code blocks, is missing. Also see sort based on date and How to get minimum start date in these start dates ? I'm not really a human, but I play one on earth. Old Perl Programmer Haiku ................... flash japh	[reply]
Re^2: Sort text string by the date embedded by and_noel (Initiate) on Oct 17, 2011 at 16:45 UTC
Sorry.. first time submitting a question here. I won't let it happen again.	[reply]
Re^3: Sort text string by the date embedded by Not_a_Number (Prior) on Oct 17, 2011 at 17:14 UTC
I find it strange that you apologise before editing your post to insert the missing tags...	[reply]
Re^4: Sort text string by the date embedded by and_noel (Initiate) on Oct 17, 2011 at 17:20 UTC
Re^5: Sort text string by the date embedded by fisher (Priest) on Oct 17, 2011 at 19:39 UTC
Re: Sort text string by the date embedded by Marshall (Canon) on Oct 17, 2011 at 16:23 UTC
Your post would be much more readable if you enclosed the code withing `<code>...</code` tags. The date format that you have is close to being able to be sorted by an alphanumeric comparison because you have 4 digit years and the months and days have leading zeroes (always are 2 digits). So in the sort, just reorder the string into the right order and use a single cmp instruction. #!/usr/bin/perl -w use strict; my @strings = ( "PROCESS_DT IN '01/01/2009'", "PROCESS_DT IN '05/23/2006'", "PROCESS_DT IN '01/01/2011'", "PROCESS_DT IN '04/19/2009'", "PROCESS_DT IN '07/01/2009'", ); @strings = sort { my ($monthA, $dayA, $yearA) = $a =~ m\|(\d+)/(\d+)/( +\d+)\|; my ($monthB, $dayB, $yearB) = $b =~ m\|(\d+)/(\d+)/( +\d+)\|; "$yearA$monthA$dayA" cmp "$yearB$monthB$dayB +" }@strings; print join("\n",@strings),"\n"; __END__ PROCESS_DT IN '05/23/2006' PROCESS_DT IN '01/01/2009' PROCESS_DT IN '04/19/2009' PROCESS_DT IN '07/01/2009' PROCESS_DT IN '01/01/2011' [download]	[reply] [d/l] [select]
Re^2: Sort text string by the date embedded by and_noel (Initiate) on Oct 17, 2011 at 17:19 UTC
This worked perfectly and was simple. Thanks for the fast response	[reply]
Re^2: Sort text string by the date embedded by AR (Friar) on Oct 17, 2011 at 16:53 UTC
This would be a great occasion to use the Schwartzian Transform.	[reply]
Re^3: Sort text string by the date embedded by Marshall (Canon) on Oct 18, 2011 at 00:14 UTC
Well, that depends upon a number of factors. The ST pre-calculates what I did with regex match and saves it for later use - this requires more memory copies and allocation - and then the transformation back into the original array. Over the years, I've done benchmarks with the ST and without. What I have found is that this is not as important as it used to be. Perl's sort algorithm has gone through improvements and in particular the worst case performance has increased dramatically due to merge sort vs quick sort. And I think that there have been other improvements "under the hood" that have made sort way faster than it once was. In a case like this, I would not at a first blush worry about performance with N=100 or even N=1,000. With N=10,000 I would think about it. So with an array of 80,000 things, an ST is clearly going to be worthwhile if performance matters. With 1,000 it is often about a toss up. I would say that the vast, vast majority of sorts that I do in Perl are on less than 100 things. I start thinking about performance considerations at about 1,000. For what it is worth those are my "rules of thumb". Now part of this does have to do with "how expensive" it is to extract the relevant comparison data from $a and $b. In the OP's question, I didn't have to call any fancy date/time modules - just very simple single regex got the job done. That matters. I don't have to save the result of a computation for use later if that computation wasn't "expensive" to begin with. And again, if I save that result, I have to make a new data structure, sort that, and then re-construct the original thing. For 100 things or less (most sorts), I don't see it. For sorting 1,000 things ST is worth thinking about. For sorting 10,000+ things you probably should be doing it.	[reply]
Re: Sort text string by the date embedded by sundialsvc4 (Abbot) on Oct 18, 2011 at 00:46 UTC
In a slightly different scenario that might nevertheless prove interesting, I actually put a SQLite database file to very good use. This is, of course, a no-cost, public domain(!) SQL database engine that stores everything in a single file and that runs on everything. So you could, for example, whip up a file (using Perl, of course) in which you put the strings as-is in one column and the various “interesting” values that you have parsed out of them into other columns. And once you have gone to that trouble, the payoff is that you can now just use `SELECT..ORDER BY.` And you have just co-opted a whole lot of difficulty out of your application by pushing the whole job onto somebody else. In fact, you have probably just made “your application” a whole lot smaller and with a whole lot less “messy work” to do. The only caveat ... and it happens to be a big one ... is that with SQLite you must use Transactions, because if you don’t, SQLite will re-read the data after every write! (Ugh...) But this is what it was expressly designed to do, and, if you but keep that little fact in mind, it performs splendidly. (As in, “faster than a sonofabeech.”) You’ll probably find yourself using them more and more often, because they are “just like ordinary flat files, but ever-so-much more-so.”


laziness, impatience, and hubris
	PerlMonks