Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Sort undef

by Anonymous Monk
on Jun 11, 2017 at 22:23 UTC ( #1192544=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Dear monks

I am using the following to order my data structure (@$ResultsFinal) according to one specific array element ($OptOrderToDisplayTable)

@$ResultsFinal = ( sort { deaccent($a->[($OptOrderToDisplayTable)]) cm +p deaccent($b->[($OptOrderToDisplayTable)])} @$ResultsFinal );

This works fine exept for the following undesired behaviour: if one element is "undef" the Perl sorting mechanism puts it at the beginning of my ordered data structure. What I am trying to achive is to put it at the end. Any suggestions?

(Sorry, no complete working example above, but I think it is clear what I am trying to achive)

Replies are listed 'Best First'.
Re: Sort undef
by LanX (Sage) on Jun 12, 2017 at 00:04 UTC
    I think you can define your own sorting sub cmp_undef.

    Something like

    sub cmp_undef { my ( $a, $b ) = @_; return $a cmp $b if defined $a and defined $b; return $b cmp $a; # invert order otherwise } @$ResultsFinal = sort \&cmp_undef(...,...), @$ResultsFinal;

    Untested!

    Update:

    wait this might go wrong if cmp_undef("",undef) returns 0. (Not sure)

    So you'd need to treat the 3 extra cases for $a, $b being undef and return -1,0,1 accordingly.

    Cheers Rolf
    (addicted to the Perl Programming Language and ☆☆☆☆ :)
    Je suis Charlie!

      This being a simple case, I thought the typical way to do it would be in the form:

      sort { (defined($a) <=> defined($b)) || ($a cmp $b) }

        Nice! I was worried that <=> can't handle "" without warnings, but that's not the case here. :)

        (empty string is the false value returned from defined(undef) )

        EDIT

        Forget it, I'm getting old ... false is a dual var which resolves to 0 in numeric context.

        Cheers Rolf
        (addicted to the Perl Programming Language and ☆☆☆☆ :)
        Je suis Charlie!

      > wait this might go wrong if cmp_undef("",undef) returns 0. (Not sure)

      actually it works undef is not cast to "" and hence less than empty string°

      but it seems you didn't sue strictures, that's why I needed to add no warnings 'uninitialized';

      > So you'd need to treat the 3 extra cases for $a, $b being undef and return -1,0,1 accordingly.

      see cmp_undef2 for that solution

      use strict; use warnings; use Data::Dump; sub cmp_undef1 { return $a cmp $b if defined $a and defined $b; no warnings 'uninitialized'; return $b cmp $a; # invert order otherwise } sub cmp_undef2 { if (defined $a) { if (defined $b) { $a cmp $b; } else { -1 } } else { if (defined $b) { 1 } else { 0 } } } my @array =( (undef)x 3, qw/c b a/, ("")x3); my @result = sort cmp_undef1 @array; dd \@result; @result = sort cmp_undef2 @array; dd \@result;

      output

      ["", "", "", "a", "b", "c", undef, undef, undef] ["", "", "", "a", "b", "c", undef, undef, undef]

      I don't want to elaborate on the sort SUBNAME LIST syntax see sort for details and other variations. (TIMTOWTDI I chose the shortest)

      Cheers Rolf
      (addicted to the Perl Programming Language and ☆☆☆☆ :)
      Je suis Charlie!

      °) which is strange because

      DB<1> p "" eq undef 1
        > actually it works undef is not cast to "" and hence less than empty string

        No, it works because sort is stable and you declared the lucky case to be the input. Try again with

        use List::Util qw{ shuffle }; my @array = shuffle((undef) x 3, qw/c b a/, ("") x 3);

        ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,

        Another slight variation on a custom sort sub:

        c:\@Work\Perl\monks>perl -le "use warnings; use strict; ;; use Test::More 'no_plan'; use Test::NoWarnings; ;; my $OptOrd = 1; ;; my $Test = [ [ 1, 'c' ], [ 2, 'a' ], [ 3, '' ], [ 4, undef ], [ 5, 'foo' ], [ 6, '' ], [ 7, 'cee' ], [ 8, undef ], [ 9, 'tee' ], [ 10, 't' ], ]; ;; my $Expected = [ [ 3, '' ], [ 6, '' ], [ 2, 'a' ], [ 1, 'c' ], [ 7, 'cee' ], [ 5, 'foo' ], [ 10, 't' ], [ 9, 'tee' ], [ 4, undef ], [ 8, undef ], ]; ;; my @got = sort by_deaccented_ascending_with_undef_highest @$Test; ;; is_deeply \@got, $Expected, 'custom sort sub'; ;; done_testing; ;; sub deaccent { return $_[0]; } ;; sub by_deaccented_ascending_with_undef_highest { my ($aa, $bb) = ($a->[$OptOrd], $b->[$OptOrd]); ;; return defined $aa && defined $bb ? deaccent($aa) cmp deaccent($bb) : defined $bb cmp defined $aa ; } " ok 1 - custom sort sub 1..1 ok 2 - no warnings 1..2
        defined returns '' (empty string) for undefined, 1 for defined, and '' will lexically cmp below 1.


        Give a man a fish:  <%-{-{-{-<

Re: Sort undef
by KurtZ (Friar) on Jun 11, 2017 at 22:38 UTC
    Well deaccent() must handle undef differently. Like returning a string guarantied to be always maximal . Like 'X'x100

    100 is a guess, you don't show us your data.

      Thanks

      This was quite easy...

      'Z' comes after 'X'x100, no matter how big you make 100.
Re: Sort undef
by Discipulus (Abbot) on Jun 11, 2017 at 23:19 UTC
    If sort works like this you can modify the returned array:  push @$ResultsFinal, shift @$ResultsFinal;

    If have many of them you can also  push @$ResultsFinal, shift @$ResultsFinal until $$ResultsFinal[0];

    PS or in the case undef has to be discarded you can @$ResultsFinal = ( sort {  ...  } grep {defined} @$ResultsFinal );

    L*

    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
Re: Sort undef
by marinersk (Priest) on Jun 12, 2017 at 12:37 UTC

    Another way to do it, while certainly not the most efficient, is to replace undefelements with a value guaranteed to both be the maximum value in the comparison and uniquely distinct from the original value set.

    I show this inefficient technique more to sow seeds for inspiration for future problems you may run into where setting a maximum value might be a useful idea. Data transmogrification still has value in some cases; having that tool in your toolshed is useful.

    Coupled with the approach used by LanX, you could probably craft a fairly efficient and reliable version of this approach.

    In this example, I convert the original array such that undefelements are replaced with a unique identifier guaranteed to sort to the bottom of the list (based on the assumption this is a string comparison); the replacement string is guaranteed to not have been found in the original array (again, via an inefficient technique -- I force it to be longer than the longest element in the original array).

    From there, the rest of the operation should be relatively intuitive.

    #!/usr/bin/perl use strict; use warnings; # Constants (Operationally, if not functionally) my $MAXCHR = chr 255; my $HORRUL = '-------------------------------------------------------- +-----------------------'; # Globals my $Maxval = ''; my @Unsorted = ( 'Dog', 'Cat', 'Bird', undef, 'Elephant', undef, 'Liza +rd' ); dumpArray("Original:", @Unsorted); # Build a maximum-value scalar: Length = longest + 1 # We will temporarily replace all undefined elements with this value # Since it is composed of all chr 255 values, it is the highest for so +rt # Since it is one character longer than the longest element, it unique +ly # identifies the element as a former undef # Then sort can do its thing and we can detect these to convert back t +o undef my $maxlen = longestElement(@Unsorted); while (length($Maxval) <= $maxlen) { $Maxval .= $MAXCHR; } # Build a working copy of the array with undef replaced by $Maxval my @unsortedWork = (); foreach (@Unsorted) { if (defined $_) { push @unsortedWork, $_; } else { push @unsortedWork, $Maxval; } } # Sort, Convert modified entries back to undef, and show results my @sorted_custom = sort @unsortedWork; my @sorted_final = (); foreach (@sorted_custom) { if ($_ eq $Maxval) { push @sorted_final, undef; } else { push @sorted_final, $_; } } dumpArray("Custom Sort:", @sorted_final); exit; # Display an array with a title sub dumpArray { my ($dumpTitle, @dumpValues) = @_; print "$HORRUL\n$dumpTitle\n$HORRUL\n"; foreach my $dumpValue (@dumpValues) { if (defined $dumpValue) { print "$dumpValue\n"; } else { print "(undef)\n"; } } print "$HORRUL\n"; } sub longestElement { my $maxlen = 0; foreach(@_) { if (defined $_) { if (length > $maxlen) { $maxlen = length; } } } return $maxlen; }

    Results:

    S:\Steve\Dev\PerlMonks\P-2017-06-12@0734-sort-undef>sort1.pl ---------------------------------------------------------------------- +--------- Original: ---------------------------------------------------------------------- +--------- Dog Cat Bird (undef) Elephant (undef) Lizard ---------------------------------------------------------------------- +--------- ---------------------------------------------------------------------- +--------- Custom Sort: ---------------------------------------------------------------------- +--------- Bird Cat Dog Elephant Lizard (undef) (undef) ---------------------------------------------------------------------- +--------- S:\Steve\Dev\PerlMonks\P-2017-06-12@0734-sort-undef>

Re: Sort undef
by Anonymous Monk on Jun 12, 2017 at 14:42 UTC
    Surprised nobody's suggested the Schwartzian transform on this one. Combined with marinersk's sentinel technique, it's a nice, simple, efficient solution.
    @$ResultsFinal = map { $_->[1] } sort { $a->[0] cmp $b->[0] } map { [deaccent($_->[$OptOrderToDisplayTable]) // chr(255), $_] } @$ResultsFinal;

      That's bloody brilliant.

      One question, though. To my eye it looks to be vulnerable to the case where the original list has at least one element which starts with two or more chr(255)characters and at least one element being undef.

      Or am I missing something?

        I'm hoping deaccent will change chr(255) (y with two dots) to a plain y. It also fails if there strings starting with unicode characters above 255. Handling unicode in full generality is a huge pain, so I punted. You probably need something like Unicode::Collate to do it right.
    A reply falls below the community's threshold of quality. You may see it by logging in.
A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1192544]
Approved by davies
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (4)
As of 2021-12-03 23:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    R or B?



    Results (30 votes). Check out past polls.

    Notices?