Sorting colon-delimited records

venki has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
(RhetTbull) Re: Sorting comma-delimited records by RhetTbull (Curate) on May 31, 2002 at 18:02 UTC
First of all, those are colons, not semicolons. ;-) Sounds like a good place for a Schwartzian Transform by our own merlyn: `#!/usr/bin/perl use strict; use warnings; my @data = <DATA>; chomp @data; my @sorted = map { $_->[0] } sort { $a->[1] cmp $b->[1] } map { [ $_, (split /:/)[1] ] } @data; print "data = \n@data\n"; print "sorted = \n@sorted\n"; __DATA__ area1:place1:name1 area1:place4:name2 area3:place3:name3 area5:place2:name2` [download] Produces: `data = area1:place1:name1 area1:place4:name2 area3:place3:name3 area5:place2: +name2 sorted = area1:place1:name1 area5:place2:name2 area3:place3:name3 area1:place4: +name2` [download] Update:For more information on the Schwartzian Transform, read Tom Christiansen's "Far More Than Everything You've Ever Wanted To Know About Sorting" paper. Update 2:Changed example data to make it more obvious what was going on.	[reply] [d/l] [select]
Re: (RhetTbull) Re: Sorting comma-delimited records by venki (Acolyte) on May 31, 2002 at 18:33 UTC
thanks. Great help! I really appreciate your timely help guys	[reply]
Re: Sorting by Beatnik (Parson) on May 31, 2002 at 17:38 UTC
Try something like... `@a=qw(foo:baz foo:bar); print sort { (split(/:/,$a))[1] cmp (split(/:/,$b))[1] } @a;` [download] altho there are faster ways, like storing each second field in a hash as key :) Greetz Beatnik ... Quidquid perl dictum sit, altum viditur.	[reply] [d/l]
Re: Re: Sorting by mephit (Scribe) on May 31, 2002 at 21:39 UTC
Hmm, comparing `(split($a))[1]` with `(split($b))[1]` was the first thing that popped into my mind, as well. But isn't that a waste of CPU cycles, splitting an element over and over again each time you wanna compare it to another? The other idea that popped into my head was extracting each "sortable" element once and storing them somewhere, (a few people had suggested a hash), so I guess it's a matter of speed or memory usage, no? For small data sets, this probably wouldn't be an issue, but maybe for larger data sets, it would. Unless the sort routine is more efficient than that, and it optimizes away rather nicely to avoid having to split the same string over and over. Just babbling some random thoughts. Anybody have any random answers? -- There are 10 kinds of people -- those that understand binary, and those that don't.	[reply] [d/l] [select]
Re: Re: Re: Sorting by Beatnik (Parson) on May 31, 2002 at 21:54 UTC
Ofcourse it's slow... that's why I'm saying a faster way would be using hashes, or complex data structures for that matter... TIMTOWTDI :) Greetz Beatnik ... Quidquid perl dictum sit, altum viditur.	[reply]
(RhetTbull) Re: Re: Re: Sorting by RhetTbull (Curate) on Jun 04, 2002 at 20:38 UTC
What you are describing is the basic idea behind the Schwartzian Transform. See my write-up elsewhere in this thread for some links with more information. The idea is that you do the expensive operation (in this case, it's split) once and use a data structure to store the result. You then sort on the results and extract the original information when done. Our very own merlyn was the first (AFAIK) to apply his twisted mind to this problem and come up with a very perlish (or lispish depending on your mother tongue) method of doing this in one fell swoop using map.	[reply]
Re: Sorting by mfriedman (Monk) on May 31, 2002 at 17:41 UTC
I would reccomend using an array of arrays and sorting the references to the arrays based on the value of the second element. For the sake of argument, I am going to assume that you have colon-delimited fields, one record per line, and that all the data has been loaded into $data. `#!/usr/bin/perl -w use strict; my $data = get_data_from_somewhere; # First split the data up into a 2D structure my @struct; for (split /\n/, $data) { push @struct, [ split /:/ ] } # Now we sort the struct on the second element of the nested arrays @struct = sort { $a->[1] cmp $b->[1] } @struct;` [download]	[reply] [d/l]
Re: Sorting comma-delimited records by vladb (Vicar) on May 31, 2002 at 17:49 UTC
You can store your records in a hash (just as Beatnik pointed out :) using each record's second field for the key. `use strict; use Data::Dumper; my @a = qw(foo:baz:faz foo:bar:fuss); my %h= map{ (split(/\:/,$_))[1] => $_ } @a; print Dumper(\%h); # to force a '\n' printed after each array element. $,="\n"; print @h{keys %h};` [download] Getting them inside a hash will assure that your records are sorted by the second field in alphabetical order. Here's the output: `$VAR1 = { 'bar' => 'foo:bar:fuss', 'baz' => 'foo:baz:faz' }; foo:bar:fuss foo:baz:faz` [download] _____________________ $"=q;grep;;$,=q"grep";for(`find . -name ".saves~"`){s;$/;;;/(.-(\d+) +-.*)$/; $_=["ps -e -o pid \| "," $2 \| "," -v "," "];`@$_`?{print"+ $1"}:{print" +- $1"}&&`rm $1`; print$\;} [download]	[reply] [d/l] [select]
Re: Sorting comma-delimited records by Ovid (Cardinal) on May 31, 2002 at 18:32 UTC
Assuming each item is a record in an array, a Schwartzian will do the trick: `my @new_array = map { $_->[0] } sort { $a->[1] cmp $b->[1] } map { [ $_, get_sortable_item($_) ] } @old_array; sub get_sortable_item { my $data = shift; return (split /:/, $data, 3)[1]; }` [download] Cheers, Ovid Update: Whoa! According to timestamps, I'm half an hour late with this node, but I swear that reply wasn't there when I just posted. Hmm... Oh well. Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.	[reply] [d/l]
(jeffa) Re: Sorting comma-delimited records by jeffa (Bishop) on May 31, 2002 at 23:53 UTC
TIMTOWTDI via DBD::CSV: `use DBI; use Data::Dumper; use strict; my $dir = '.'; my $file = 'simple_csv'; my $cols = [qw(one two three)]; my $dbh = DBI->connect( "DBI:CSV:f_dir=$dir;csv_eol=\n;csv_sep_char=:;", {RaiseError=>1}, ); $dbh->{csv_tables}->{$file} = { col_names => $cols }; my $sth = $dbh->selectall_arrayref(" select one, two, three from simple_csv order by two "); print Dumper $sth;` [download] This assumes that you are in the same directory as the CSV file and the CSV file is named 'simple_csv' - note there is no extension in the file name. Read the docs for more info. Here is the sample CSV file i used: simple_csv three:place3:baz two:place2:bar four:place4:qux one:place1:foo jeffa L-LL-L--L-LL-L--L-LL-L-- -R--R-RR-R--R-RR-R--R-RR B--B--B--B--B--B--B--B-- H---H---H---H---H---H--- (the triplet paradiddle with high-hat)	[reply] [d/l]


good chemistry is complicated, and a little bit messy -LW
	PerlMonks