Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:
Hi all,
I have an array of strings:
@origarray=('1|:|Test','2|:|Hash','3|:|Stuff','4|:|India');
I want to sort these strings by the names, ie. Test Hash India Stuff.
Is there an efficient way of doing this without creating an extra array to store these values in?
Dan
(tye)Re: Substring Sort
by tye (Sage) on Jun 01, 2001 at 01:26 UTC
|
Well, since you said "efficient", I'll offer my favorite:
@origarray= @origarray[
do {
my @by= map {(split(/\|/,$_,3))[2]} @origarray;
sort {$by[$a] cmp $by[$b]} 0..$#origarray
}
];
The above should be about as fast as you can get (though
none of that matters unless you are sorting a huge number
of items).
The "obvious" other two choices involve repeatedly
extracting the substring to be compared (can be much
slower but requires the least amount of memory), or
using the Schwartzian Transform which does something
similar to my method except it makes a ton of very small
arrays w/o names instead of one array with a name (falls
between the other two methods in speed but requires the
most memory).
Update 2: Thanks to arhuman for reminding me of
"my other favorite sorting scheme". His method is harder
for me to compare. I suspect it may be faster and require
only slightly more memory than mine above, at least for
some cases. In any case, it is at least close.
I tend
to use it when sorting on multiple fields, especially if
they constitute most of the data. I use my method above
to simultaneously sort multiple related lists (since none
of the other methods can do that) -- though you have to
save the list returned by sort to do that.
Updated to add missing ')'.
-
tye
(but my friends call me "Tye") | [reply] [d/l] |
(jeffa) Re: Substring Sort
by jeffa (Bishop) on Jun 01, 2001 at 00:40 UTC
|
Well, this doens't pass the 'extra array' requirement, but
if your data set is small enough, use a Schwartzian Transform:
my @sorted =
map { $_->[0] }
sort { $a->[1] cmp $b->[1] }
map { my ($n,$t) = split('|:|',$_,2); [$_,$t] } @origarray;
# change this to sort by lower case: [$_,lc($t)]
Jeff
R-R-R--R-R-R--R-R-R--R-R-R--R-R-R--
L-L--L-L--L-L--L-L--L-L--L-L--L-L--
| [reply] [d/l] |
|
I can't test it nor benchmark it* but isn't the following code faster ?
my @sorted =
map { my ($n,$t) = split('|:|',$_,2); $t.'|:|'.$n }
sort
map { my ($n,$t) = split('|:|',$_,2); $t.'|:|'.$n } @origarray;
The idea behind is that default sort is always faster according to this excellent article about Efficient Perl sorting
(I couldn't cite this jewel too much...)
* I'm at home and my Perl environment is curently down
"Only Bad Coders Code Badly In Perl" (OBC2BIP)
| [reply] [d/l] |
|
Careful there... When you use the default sort and have extra data appended to each string, you have to make sure the extra data doesn't affect the sort order.
In this case, '|' (ASCII value 124) sorts after most other characters, so that code would sort a string before its prefixes. For example, 'string|extra data' comes after 'string plus|extra data', even though 'string' comes before 'string plus'.
I think the easiest way to solve this problem is by sticking a null character between the sort string and the extra data. (If the data contains null characters already, it takes a bit more work.)
@origarray = qw/ third|:|bb second|:|b fourth|:|c first|:|a /;
my @sorted = map { /\0(.*)/s }
sort
map { (split /\|:\|/)[-1] . "\0$_" }
@origarray;
print "@sorted\n";
That outer map could be done with substr() and index() instead of a regex. | [reply] [d/l] |
|
Eek, don't forget to escape the pipes in your split. Remember, just because there's quotes around it doesn't mean that it's not getting compiled as a regex.
MeowChow
s aamecha.s a..a\u$&owag.print | [reply] |
Re: Substring Sort
by MeowChow (Vicar) on Jun 01, 2001 at 02:24 UTC
|
Boredom == Benchmarks.
tye's routine wins overall, however, arhuman's does scale better with huge lists. I also tossed in a method of my own, based on tye's approach, which I was curious about. Not too shabby, but it scales rather poorly.
| [reply] [d/l] [select] |
|
When I see such a huge disparity in benchmarks, the first thing I think is "oops, implemented one of them wrong".
I think the problem here is that myocom's sort is the only one where the return value from the subroutine is also the return value from sort, which means that when that function is called in a scalar context, sort ends up in a scalar context which causes it to just return undef and do no work.
I added @_= to the subs passed to Benchmark and got these results:
| [reply] [d/l] [select] |
|
Thanks for pointing that out! I knew something was wrong with those numbers, but for the life of me, couldn't figure out what. Tossing in a print statement to check the return value obviously didn't help things :-)
Regarding my sort routine, if you run my updated version (your scores look like the original) on a Perl 5.6.1, you should find that it's the fastest until about 10,000 elements, at which point arhuman's begins to take the lead:
Comparing for 1000 elements
Rate myo st tye ah mc
myo 18.4/s -- -61% -72% -78% -83%
st 46.9/s 154% -- -29% -43% -58%
tye 66.5/s 261% 42% -- -20% -40%
ah 82.7/s 348% 76% 24% -- -25%
mc 111/s 500% 136% 66% 34% --
Comparing for 2500 elements
Rate myo st tye ah mc
myo 7.46/s -- -54% -69% -76% -78%
st 16.3/s 118% -- -32% -47% -51%
tye 24.0/s 222% 48% -- -22% -28%
ah 31.0/s 315% 90% 29% -- -8%
mc 33.5/s 349% 106% 39% 8% --
Comparing for 10000 elements
Rate myo st tye mc ah
myo 1.36/s -- -53% -68% -76% -78%
st 2.88/s 112% -- -33% -49% -55%
tye 4.29/s 214% 49% -- -25% -32%
mc 5.69/s 317% 97% 33% -- -10%
ah 6.34/s 365% 120% 48% 12% --
Perl 5.6 gives rather different results though...
MeowChow
s aamecha.s a..a\u$&owag.print | [reply] [d/l] |
|
Correct me if I'm wrong, but I believe the original poster wanted to sort those strings by a substring (rather than just sorting the substring)? So while you folks are going about your fancy, big city mapping, you're throwing away part of the data he wanted. :-)
Update: Say, they do get reconstructed, don't they. Still seems like an awful lot of code to get what he wanted, though...
Update part deux: Woohoo! Thanks for adding me into the benchmark. :-)
| [reply] |
|
If you look at the code, you'll see that the original strings get reconstructed or re-referenced in each case.
MeowChow
s aamecha.s a..a\u$&owag.print | [reply] |
Re: Substring Sort
by Albannach (Monsignor) on Jun 01, 2001 at 00:50 UTC
|
print join',',sort {substr($a,rindex($a,'|:|')) cmp substr($b,rindex($b,'|:|'))} @origarray"
produces
2|:|Hash,4|:|India,3|:|Stuff,1|:|Test
--
I'd like to be able to assign to an luser | [reply] [d/l] [select] |
Re: Substring Sort
by myocom (Deacon) on Jun 01, 2001 at 00:53 UTC
|
use strict;
my @origarray=('1|:|Test','2|:|Hash','3|:|Stuff','4|:|India');
print join("\n",sort {(substr $a,4) cmp (substr $b,4)} @origarray);
| [reply] [d/l] |
Re: Substring Sort
by tachyon (Chancellor) on Jun 01, 2001 at 07:09 UTC
|
All this complexity. This works fine, assign instead of print if you want.
print sort map{/\|(\w+)$/}@origarray;
tachyon | [reply] [d/l] |
|
| [reply] |
|
|