Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Sorting by column

by Sun751 (Beadle)
on Sep 15, 2009 at 03:10 UTC ( [id://795277]=perlquestion: print w/replies, xml ) Need Help??

Sun751 has asked for the wisdom of the Perl Monks concerning the following question:

I have a file which list the file permission,main directory and filename. I am trying to sort it,
I read it into array and tried to sort it by its 2nd column like this,
@array = sort{(split(/\s/,$a))[1] cmp (split(/\s/,$b))[1]} @array;

but the problem is when 2nd column is same I want it to be sorted by 3rd column.
2755 home 444 home/backup appletest.txt 444 home/backup dhl.txt 444 home/support appletest.bat 2755 bin 755 bin/backup env.txt 755 bin/support arc.bat 755 bin/backup aus.txt 2755 etc 644 etc/backup appletest.txt 644 etc/support arc.bat 644 etc/support dhl.bat 644 etc/support env.bat
Any Idea how can it be done? Please
Cheers

Replies are listed 'Best First'.
Re: Sorting by column
by ikegami (Patriarch) on Sep 15, 2009 at 04:03 UTC
    You know what cmp returns, right? When there's a tie, it returns zero. So when it's zero, you want to compare the 3rd columns.
    @array = sort { my @cols_a = split /\s/, $a; my @cols_b = split /\s/, $b; $cols_a[1] cmp $cols_b[1] || $cols_a[2] cmp $cols_b[2] } @array;
      Or for a more general solution that doesn't require adding || after || after || and which uses data to drive the sort instead of hard-coding it, check out Sort::MultipleFields. You'll need to turn your data into an array of hashrefs first and then back again afterwards.
Re: Sorting by column
by desemondo (Hermit) on Sep 15, 2009 at 04:05 UTC
    I'm a fan of the GRT sort. I like it because it allows multiple keys to be sorted simultaneously in one action. Whether this is a narrow-minded view I have yet to learn... (which I'm still doing :p)

    Full info if your interested: Advanced Sorting - GRT - Guttman Rosler Transform
    Original whitepaper discussing sorting in general and GRT: A Fresh Look at Efficient Perl Sorting

    The basic structure of the grt is:
    map { unpack or substr }sort map{ pack [template, $var1, $var2, etc, $string_or_reference] }
    with the main principle being to let Perl use its highly optimised default sorting algorithm. When tweaking how the sort function works, it can get very slow and inefficient pretty easily. With the GRT sort you call the sort function either as plain old sort or reverse sort

    If you really don't care about how sorting works, take a look at Sort::Maker. It takes a series of arguments and creates a sorting sub for you based on the criteria you provide.

    If you are interested on how a GRT sort could be applied to your problem keep reading. :) (and this is by no means the one and only solution. I'm actually interested to see other monks comments, and if my below solution could be improved further)

    #! usr/bin/perl -w use strict; use warnings; my @original_array = <DATA>; foreach (@original_array){ print "$_"; } my @sorted_array = map{ substr($_,68); #unpack('x68 A*', $_) #achieves same thing but apparently l +ess efficient... } sort map { my ($perm, $path, $filename) = split /\s+/, $_; pack 'N A32 A32 A*', $perm, $path, $filename, $_; } (@original_array); print "\n\n"; foreach (@sorted_array){ print "$_"; } __DATA__ 2755 home 444 home/backup appletest.txt 444 home/backup dhl.txt 444 home/support appletest.bat 2755 bin 755 bin/backup env.txt 755 bin/support arc.bat 755 bin/backup aus.txt 2755 etc 644 etc/backup appletest.txt 644 etc/support arc.bat 644 etc/support dhl.bat 644 etc/support env.bat
    From what I've learnt so far, the pack is essentially prepending a header on the front of each line of text, which the sort function only ever reads upto, before deciding whether two lines are le, ge, or eq.

    The substr (and unpack), essentially strip off the header portion and return into the array @sorted_array the original line of text.

    Hope this is helpful to you
Re: Sorting by column
by alexlc (Beadle) on Sep 15, 2009 at 04:04 UTC

    You're most of the way there. cmp returns 0 if the values are equal. All you need to do in your sort routine is assign the value of your initial cmp to an intermediate variable. If that variable is not zero, then you return that value. If that variable is zero, then you do a second compare of the 3rd item ( 2nd index ) of each row, and use that as the result.

    As an alternative ( and in this case probably easier ) solution, you could also just compare the 2 right hand columns as strings. Try using split with the limit option.

    sort { my(undef,$a_val) = split(/\s+/,$a,2); my(undef,$b_val) = split(/\s+/,$b,2); $a_val cmp $b_val; }
    -- AlexLC
Re: Sorting by column
by jbt (Chaplain) on Sep 15, 2009 at 03:54 UTC
    One way to do it:

    sub by_row { my ($first, $second) = ($a, $b); $first =~ s/\d+\s+(.*)/\1/; $second =~ s/\d+\s+(.*)/\1/; $first cmp $second; } @array = sort by_row @array;
Re: Sorting by column
by ph0enix (Friar) on Sep 16, 2009 at 11:55 UTC

    Try following column-oriented sort script. Columns are expected to be delimited by one or more whitespaces. You can specify column(s) and column type to be used for sorting.

    #!/usr/bin/perl -w # # Description: # Column oriented sort utility # my @columns; my $column; my $type; my $first; my $line; my $sort_code; my $sort_tmpl = q|sub sort_sub { my @col_a = split(/\s+/, $a); my @col_b = split(/\s+/, $b); __SORT_CODE__ }|; sub print_usage { my $exit = shift || 0; my $error = shift; print qq|Usage: $0 <column>[,<column>]+ [file] Column oriented sort. Input data columns are expected to be delimited by one or more whitespaces. Param(s): column - number of column (starting from 1) to be used for sorting. Use suffix to specify column type. Available suffixes a - column is text (default) d - column is decimal number x - column is hexadecimal number file - source data file. Data are read from STDIN if omitted Example: $0 3d,2x,4 /input/data Sort content of /input/data using 1. 3rd column as decimal number, 2. 2nd column as hexadecimal number and 3. finaly 4th column as text |; print "ERROR: $error\n" if ($error); exit $exit; } print_usage(1, "Missing params") if (@ARGV < 1); print_usage(0) if ($ARGV[0] =~ /^--?h(e(lp?)?)?/); @columns = split(/,/, shift @ARGV); print_usage(1, "Incorrect 'column(s)' param") if (!@columns); $first = 1; $line = <>; for (@columns) { $sort_code .= "|| " if (!$first); if ($_ =~ /^(\d+)(.)?/) { $column = $1 - 1; $type = $2 ? lc($2) : 'a'; $type = 'a' if ($type !~ /^(?:a|d|x)$/); if ($type eq 'x') { $sort_code .= "hex(\$col_a[$column]) <=> hex(\ +$col_b[$column]) "; } elsif($type eq 'd') { $sort_code .= "\$col_a[$column] <=> \$col_b[$c +olumn] "; } else { $sort_code .= "\$col_a[$column] cmp \$col_b[$c +olumn] "; } } else { print_usage(1, "Incorrect 'column(s)' param"); } $first = 0; } $sort_code .= ";\n"; $sort_tmpl =~ s/__SORT_CODE__/$sort_code/; eval $sort_tmpl; print sort sort_sub $line, <>;

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://795277]
Approved by McDarren
Front-paged by tye
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (4)
As of 2024-04-19 22:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found