Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re^4: Sorting based on any column

by aaron_baugher (Curate)
on May 20, 2015 at 11:25 UTC ( [id://1127236]=note: print w/replies, xml ) Need Help??


in reply to Re^3: Sorting based on any column
in thread Sorting based on any column

It depends on whether your sample data represents an array-of-arrays, with each non-whitespace token as an element of a sub-array, or a single-level array with each line as an element. In the first case, it's fairly simple, something like this:

sub sort_aoa { my( $array, $column ) = @_; return sort { $a->[$column] cmp $b->[$column] } @$array; }

If each element is a whole line, you'll have to split them into words before sorting. This is where a Schwartzian Transform is likely to help the most, but I'll show the basic idea and you can add that:

sub sort_lines_by_column { my( $array, $column ) = @_; return sort { return( (split ' ', $a)[$column] cmp (split ' ', $b)[$column] ); } @$array; }

(Untested. In both cases, replace the 'cmp' comparison with whatever you want.)

Aaron B.
Available for small or large Perl jobs and *nix system administration; see my home node.

Replies are listed 'Best First'.
Re^5: Sorting based on any column
by Anonymous Monk on May 21, 2015 at 12:46 UTC

    Wow, thanks a lot Aaaron !!!

    This is not Array of Array, so I used second code what you have provided and it works perfectly fine. But I have few doubts:

    1. In split you have specified ' ' which means it will split on single space, but in reality it splits for any number of space.

    2. How can I do it using 'Schwartzian transform', I am still novice in perl, please do not mind.

    3. How can I have flexibility to pass sorting order Ascending OR Descending to this subroutine. I tried as shown below but its not working.

    sub sort_lines_by_column { my( $array, $column, $order ) = @_; my $ab; my $cd; if ($order eq 'asc') {$ab = "\$a"; $cd = "\$b";} elsif ($order eq 'dsc +') {$ab = "\$b"; $cd = "\$a";}; return sort { return( (split ' ', $ab)[$column] <=> (split ' ', $cd)[$column] ); } @$array; }

    Its giving error "Use of uninitialized value in numeric comparison (<=>)"

      Its giving error "Use of uninitialized value in numeric comparison (<=>)"

      That's because this is almost certainly not doing whatever you think it's doing:

      $ab = "\$a";

      That's taking the value of $a and appending it to a backslash and making it the value of $ab, so "11" becomes "\11". I'm guessing that you're trying to make $ab a reference to $a, but to do that you'd need to leave out the quotes, and that would also change the later code.

      Personally, if I wanted to have a toggle between two different ways to sort, I'd do it like this (unless the sort comparison is very complex, in which case it should be in a separate subroutine anyway):

      sub sort_array_by_column_asc_or_desc { my( $array, $column, $order ) = @_; if( $order eq 'desc' ){ return sort { put_descending_sort_comparison_here } @$array; } else { # default to ascending sort return sort { put_ascending_sort_comparison_here } @$array; } }

      Note: the else there isn't necessary, but I like it because it makes the choice obvious.

      Aaron B.
      Available for small or large Perl jobs and *nix system administration; see my home node.

        $ab = "\$a";

        That's taking the value of $a and appending it to a backslash ...

        Actually, it's escaping a  $ (dollar) character to make it a literal rather than a scalar sigil, which would result in scalar interpolation:

        c:\@Work\Perl\monks>perl -wMstrict -le "my $a = 'foo'; my $ab = qq{'\$a' '$a'}; print $ab; " '$a' 'foo'
        (I have to use  qq{} instead of  "" (double-quotes) to keep Windoze command line from going nuts with escapes.)


        Give a man a fish:  <%-(-(-(-<

      UPDATE: Thanks to AnomalousMonk for catching my mistake in forgetting that map needed to return a reference to the new array. Corrected in the code below. Also, he has a very nice way to handle the sorting choice; check that out in his reply.

      1. In split you have specified ' ' which means it will split on single space, but in reality it splits for any number of space.

      Right, that's a special case for split, which splits on any whitespace. That usually works well unless you need something more specific -- say, if your fields are separated by tabs but can include spaces. If you need to split on a specific type or amount of whitespace, adjust the first argument to split accordingly.

      2. How can I do it using 'Schwartzian transform', I am still novice in perl, please do not mind.

      I wouldn't call that a novice-level technique; I probably used Perl for several years before creating a ST myself. You can find plenty of examples and tutorials on it. But basically, it goes something like this:

      # in pseudo-code: for each element calculate the sorting value for that element put the original element and the sorting value into a 2-element list pass these 2-element lists to your sorting routine, which sorts based on the sorting value of each element for each element in the sorted list of 2-element lists pull out the original element # in perl, an example using the typical map/sort/map layout # to sort a list of numbers based on the return value of a # complex subroutine that calculates the number of primes # less than each number's 25th power: my @newarray = map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [ $_ => n_primes_less_than_25th_power($_) ] } @oldarray;

      The essence is that each element of @oldarray is passed to the map which does the complex calculation on that element and creates a 2-element array containing the original element and the calculated value. References to those are passed to the sort, which can then sort on the calculated values without needed to recalculate them for every comparison. The sorted references then go to the map which just passes on the original values.

      Your case would look much like this, except that instead of calling my n_primes...() subroutine, you'd have some code (or a call to a subroutine) in that location that parses out the value on which you want to sort. The first element $_->[0] would be the line, and the second element $_->[1] would be the parsed-out value for sorting.

      Aaron B.
      Available for small or large Perl jobs and *nix system administration; see my home node.

        my @newarray = map { $_->[0] }
                       sort { $a->[1] <=> $b->[1] }
                       map { $_ => n_primes_less_than_25th_power($_) }
                     @oldarray;

        The essence is that each element of @oldarray is passed to the map which does the complex calculation on that element and creates a 2-element array containing the original element and the calculated value. Those are passed to the sort ...

        There's a big problem here. The first map expression must return a reference to a two-element array to pass to sort:
            map { [ $_ => n_primes_less_than_25th_power($_) ] }

        Note that the the  => (fat comma) in the above expression is just an idiosyncratic variation on the more common  , (comma) operator, so another version of the expression might be:
            map [ $_, n_primes_less_than_25th_power($_) ],
        (which also dispenses with the enclosing  { ... } code block (update: because  [ ... ] is a simple expression); note this needs a terminating comma operator).

        One way to easily control sort order is to realize that the  <=> and  cmp operators (see perlop) return (-1, 0, 1) as the result of their comparisons. If you have a scalar  $order that may have only the values 1 (ascending) or -1 (descending), it's easy to control ordering:
            sort { $order * ($a->[1] <=> $b->[1]) }
        (among other ways, of course).

        For tutorials on sorting and transformation sorts, see List Processing, Filtering, and Sorting, and in particular Understanding transformation sorts (ST, GRT), the details. See also A Fresh Look at Efficient Perl Sorting for an in-depth discussion of ST and GRT sorting.


        Give a man a fish:  <%-(-(-(-<

      If you're willing (and able) to move your data into a AoA structure (i.e $array[$row][$col] format) and willing to forgo the Schwartzian transform aspect, Data::Table can sort your data on any selected column in ascending or descending order. And "table::sort can take a user supplied operator, this is useful when neither numerical nor alphabetic order is correct."

      Also, if you ever need to do complex sorting (such as sort first by column A in ascending order and then sort by column B in descending order...), Data::Table can handle that too.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1127236]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (4)
As of 2024-04-25 23:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found