and_noel has asked for the wisdom of the Perl Monks concerning the following question:
This node falls below the community's threshold of quality. You may see it by logging in.
Re: Sort text string by the date embedded
by zentara (Archbishop) on Oct 17, 2011 at 16:10 UTC
|
| [reply] |
|
Sorry.. first time submitting a question here. I won't let it happen again.
| [reply] |
|
| [reply] |
|
|
Re: Sort text string by the date embedded
by Marshall (Canon) on Oct 17, 2011 at 16:23 UTC
|
Your post would be much more readable if you enclosed the code withing <code>...</code tags.
The date format that you have is close to being able to be sorted by an alphanumeric comparison because you have 4 digit years and the months and days have leading zeroes (always are 2 digits).
So in the sort, just reorder the string into the right order and use a single cmp instruction.
#!/usr/bin/perl -w
use strict;
my @strings = (
"PROCESS_DT IN '01/01/2009'",
"PROCESS_DT IN '05/23/2006'",
"PROCESS_DT IN '01/01/2011'",
"PROCESS_DT IN '04/19/2009'",
"PROCESS_DT IN '07/01/2009'", );
@strings = sort {
my ($monthA, $dayA, $yearA) = $a =~ m|(\d+)/(\d+)/(
+\d+)|;
my ($monthB, $dayB, $yearB) = $b =~ m|(\d+)/(\d+)/(
+\d+)|;
"$yearA$monthA$dayA" cmp "$yearB$monthB$dayB
+"
}@strings;
print join("\n",@strings),"\n";
__END__
PROCESS_DT IN '05/23/2006'
PROCESS_DT IN '01/01/2009'
PROCESS_DT IN '04/19/2009'
PROCESS_DT IN '07/01/2009'
PROCESS_DT IN '01/01/2011'
| [reply] [d/l] [select] |
|
| [reply] |
|
| [reply] |
|
Well, that depends upon a number of factors. The ST pre-calculates what I did with regex match and saves it for later use - this requires more memory copies and allocation - and then the transformation back into the original array.
Over the years, I've done benchmarks with the ST and without. What I have found is that this is not as important as it used to be. Perl's sort algorithm has gone through improvements and in particular the worst case performance has increased dramatically due to merge sort vs quick sort. And I think that there have been other improvements "under the hood" that have made sort way faster than it once was.
In a case like this, I would not at a first blush worry about performance with N=100 or even N=1,000. With N=10,000 I would think about it. So with an array of 80,000 things, an ST is clearly going to be worthwhile if performance matters. With 1,000 it is often about a toss up.
I would say that the vast, vast majority of sorts that I do in Perl are on less than 100 things. I start thinking about performance considerations at about 1,000.
For what it is worth those are my "rules of thumb". Now part of this does have to do with "how expensive" it is to extract the relevant comparison data from $a and $b. In the OP's question, I didn't have to call any fancy date/time modules - just very simple single regex got the job done. That matters.
I don't have to save the result of a computation for use later if that computation wasn't "expensive" to begin with. And again, if I save that result, I have to make a new data structure, sort that, and then re-construct the original thing. For 100 things or less (most sorts), I don't see it. For sorting 1,000 things ST is worth thinking about. For sorting 10,000+ things you probably should be doing it.
| [reply] |
Re: Sort text string by the date embedded
by sundialsvc4 (Abbot) on Oct 18, 2011 at 00:46 UTC
|
In a slightly different scenario that might nevertheless prove interesting, I actually put a SQLite database file to very good use. This is, of course, a no-cost, public domain(!) SQL database engine that stores everything in a single file and that runs on everything. So you could, for example, whip up a file (using Perl, of course) in which you put the strings as-is in one column and the various “interesting” values that you have parsed out of them into other columns. And once you have gone to that trouble, the payoff is that you can now just use SELECT..ORDER BY. And you have just co-opted a whole lot of difficulty out of your application by pushing the whole job onto somebody else. In fact, you have probably just made “your application” a whole lot smaller and with a whole lot less “messy work” to do.
The only caveat ... and it happens to be a big one ... is that with SQLite you must use Transactions, because if you don’t, SQLite will re-read the data after every write! (Ugh...) But this is what it was expressly designed to do, and, if you but keep that little fact in mind, it performs splendidly. (As in, “faster than a sonofabeech.”) You’ll probably find yourself using them more and more often, because they are “just like ordinary flat files, but ever-so-much more-so.”
| |
|
|