(RhetTbull) Re: Sorting comma-delimited records
by RhetTbull (Curate) on May 31, 2002 at 18:02 UTC
|
#!/usr/bin/perl
use strict;
use warnings;
my @data = <DATA>;
chomp @data;
my @sorted = map { $_->[0] }
sort { $a->[1] cmp $b->[1] }
map { [ $_, (split /:/)[1] ] } @data;
print "data = \n@data\n";
print "sorted = \n@sorted\n";
__DATA__
area1:place1:name1
area1:place4:name2
area3:place3:name3
area5:place2:name2
Produces:
data =
area1:place1:name1 area1:place4:name2 area3:place3:name3 area5:place2:
+name2
sorted =
area1:place1:name1 area5:place2:name2 area3:place3:name3 area1:place4:
+name2
Update:For more information on the Schwartzian Transform, read Tom Christiansen's "Far More Than Everything You've Ever Wanted To Know About Sorting" paper.
Update 2:Changed example data to make it more obvious what was going on. | [reply] [d/l] [select] |
|
thanks.
Great help! I really appreciate your timely help guys
| [reply] |
Re: Sorting
by Beatnik (Parson) on May 31, 2002 at 17:38 UTC
|
@a=qw(foo:baz foo:bar);
print sort { (split(/:/,$a))[1] cmp (split(/:/,$b))[1] } @a;
altho there are faster ways, like storing each second field in a hash as key :)
Greetz
Beatnik
... Quidquid perl dictum sit, altum viditur. | [reply] [d/l] |
|
Hmm, comparing (split($a))[1] with (split($b))[1] was the first thing that popped into my mind, as well. But isn't that a waste of CPU cycles, splitting an element over and over again each time you wanna compare it to another? The other idea that popped into my head was extracting each "sortable" element once and storing them somewhere, (a few people had suggested a hash), so I guess it's a matter of speed or memory usage, no? For small data sets, this probably wouldn't be an issue, but maybe for larger data sets, it would. Unless the sort routine is more efficient than that, and it optimizes away rather nicely to avoid having to split the same string over and over.
Just babbling some random thoughts. Anybody have any random answers?
--
There are 10 kinds of people -- those that understand binary, and those that don't.
| [reply] [d/l] [select] |
|
Ofcourse it's slow... that's why I'm saying a faster way would be using hashes, or complex data structures for that matter... TIMTOWTDI :)
Greetz
Beatnik
... Quidquid perl dictum sit, altum viditur.
| [reply] |
|
What you are describing is the basic idea behind the Schwartzian Transform. See my write-up elsewhere in this thread for some links with more information. The idea is that you do the expensive operation (in this case, it's split) once and use a data structure to store the result. You then sort on the results and extract the original information when done. Our very own merlyn was the first (AFAIK) to apply his twisted mind to this problem and come up with a very perlish (or lispish depending on your mother tongue) method of doing this in one fell swoop using map.
| [reply] |
Re: Sorting
by mfriedman (Monk) on May 31, 2002 at 17:41 UTC
|
I would reccomend using an array of arrays and sorting the references to the arrays based on the value of the second element. For the sake of argument, I am going to assume that you have colon-delimited fields, one record per line, and that all the data has been loaded into $data.
#!/usr/bin/perl -w
use strict;
my $data = get_data_from_somewhere;
# First split the data up into a 2D structure
my @struct;
for (split /\n/, $data) {
push @struct, [ split /:/ ]
}
# Now we sort the struct on the second element of the nested arrays
@struct = sort { $a->[1] cmp $b->[1] } @struct;
| [reply] [d/l] |
Re: Sorting comma-delimited records
by vladb (Vicar) on May 31, 2002 at 17:49 UTC
|
You can store your records in a hash (just as Beatnik pointed out :)
using each record's second field
for the key.
use strict;
use Data::Dumper;
my @a = qw(foo:baz:faz foo:bar:fuss);
my %h= map{ (split(/\:/,$_))[1] => $_ } @a;
print Dumper(\%h);
# to force a '\n' printed after each array element.
$,="\n";
print @h{keys %h};
Getting them inside a hash will assure
that your records are sorted by the second
field in alphabetical order. Here's the output:
$VAR1 = {
'bar' => 'foo:bar:fuss',
'baz' => 'foo:baz:faz'
};
foo:bar:fuss
foo:baz:faz
_____________________
$"=q;grep;;$,=q"grep";for(`find . -name ".saves*~"`){s;$/;;;/(.*-(\d+)
+-.*)$/;
$_=["ps -e -o pid | "," $2 | "," -v "," "];`@$_`?{print"+ $1"}:{print"
+- $1"}&&`rm $1`;
print$\;}
| [reply] [d/l] [select] |
Re: Sorting comma-delimited records
by Ovid (Cardinal) on May 31, 2002 at 18:32 UTC
|
Assuming each item is a record in an array, a Schwartzian will do the trick:
my @new_array =
map { $_->[0] }
sort { $a->[1] cmp $b->[1] }
map { [ $_, get_sortable_item($_) ] }
@old_array;
sub get_sortable_item {
my $data = shift;
return (split /:/, $data, 3)[1];
}
Cheers,
Ovid
Update: Whoa! According to timestamps, I'm half an hour late with this node, but I swear that reply wasn't there when I just posted. Hmm... Oh well.
Join the Perlmonks Setiathome Group or just click on the the link and check out our stats. | [reply] [d/l] |
(jeffa) Re: Sorting comma-delimited records
by jeffa (Bishop) on May 31, 2002 at 23:53 UTC
|
use DBI;
use Data::Dumper;
use strict;
my $dir = '.';
my $file = 'simple_csv';
my $cols = [qw(one two three)];
my $dbh = DBI->connect(
"DBI:CSV:f_dir=$dir;csv_eol=\n;csv_sep_char=:;",
{RaiseError=>1},
);
$dbh->{csv_tables}->{$file} = { col_names => $cols };
my $sth = $dbh->selectall_arrayref("
select one, two, three
from simple_csv
order by two
");
print Dumper $sth;
This assumes that you are in the same directory as the
CSV file and the CSV file is named 'simple_csv' - note
there is no extension in the file name. Read the docs for
more info. Here is the sample CSV file i used:
simple_csv
three:place3:baz
two:place2:bar
four:place4:qux
one:place1:foo
jeffa
L-LL-L--L-LL-L--L-LL-L--
-R--R-RR-R--R-RR-R--R-RR
B--B--B--B--B--B--B--B--
H---H---H---H---H---H---
(the triplet paradiddle with high-hat)
| [reply] [d/l] |