Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Problem with a sort result

by kzwix (Sexton)
on Jan 14, 2016 at 17:30 UTC ( [id://1152791]=perlquestion: print w/replies, xml ) Need Help??

kzwix has asked for the wisdom of the Perl Monks concerning the following question:

My salutations, ô wise ones.

I've specified a custom block for a sort, so that I look at the value in a related hash to determine the correct order (I'm ordering IPv4 addresses from the most glutton in bandwidth to the least).

So, I wrote:

return sort {return ($refHash->{$b}->{'NBOctets'} - $refHash->{$a}->{'NBOctets'});} keys %$refHash;

($refHash being a reference on a HASH, whose keys are the IPs I want to order, and whose values associated to the keys are references to hashes (namely, for each IP, a hash holding a lot of data, including a 'NBOctets' key, mapping to a number of bytes - The very value I want to order them by, in descending order).

It works quite well, as the following extract of the final result can testify:

172.16.1.104   : 13908480193 octets envoyés, en 100667.7338 secondes, soit un débit moyen de 138162.245915185 octets/seconde.
17.253.55.206  : 771918754 octets envoyés, en 1355.3769 secondes, soit un débit moyen de 569523.321520383 octets/seconde.
172.16.1.120   : 705928230 octets envoyés, en 527922.346599999 secondes, soit un débit moyen de 1337.18194455382 octets/seconde.
17.253.55.204  : 480978512 octets envoyés, en 1630.4618 secondes, soit un débit moyen de 294995.26575845 octets/seconde.
62.210.167.210 : 468692611 octets envoyés, en 594.0011 secondes, soit un débit moyen de 789043.338471932 octets/seconde.
172.16.1.76    : 425684357 octets envoyés, en 261235.63 secondes, soit un débit moyen de 1629.50343718428 octets/seconde.
172.16.1.1     : 288170793399 octets envoyés, en 621335.8493 secondes, soit un débit moyen de 463792.317349231 octets/seconde.
204.92.52.217  : 406751806 octets envoyés, en 373702.6954 secondes, soit un débit moyen de 1088.43690721745 octets/seconde.
13.107.4.50    : 388980279 octets envoyés, en 1578.6308 secondes, soit un débit moyen de 246403.578974894 octets/seconde.
172.16.1.91    : 287922363 octets envoyés, en 70079.6916999999 secondes, soit un débit moyen de 4108.49928154008 octets/seconde.
17.253.55.201  : 281729171 octets envoyés, en 1190.8791 secondes, soit un débit moyen de 236572.437118092 octets/seconde.
188.165.60.91  : 281457541 octets envoyés, en 11396.047 secondes, soit un débit moyen de 24697.8220605794 octets/seconde.
176.31.246.142 : 279936059 octets envoyés, en 11400.6948 secondes, soit un débit moyen de 24554.2981292684 octets/seconde.
62.115.249.136 : 276985087 octets envoyés, en 5152.5337 secondes, soit un débit moyen de 53757.0646068749 octets/seconde.
23.79.119.237  : 272756603 octets envoyés, en 2225.0329 secondes, soit un débit moyen de 122585.424691923 octets/seconde.
23.48.229.216  : 269146838 octets envoyés, en 1108.235 secondes, soit un débit moyen de 242860.799379193 octets/seconde.
92.103.108.76  : 256676616 octets envoyés, en 3320.139 secondes, soit un débit moyen de 77308.9970028363 octets/seconde.
15.216.225.136 : 253665959 octets envoyés, en 319.3335 secondes, soit un débit moyen de 794360.626116583 octets/seconde.
15.217.130.33  : 250905202 octets envoyés, en 262.059 secondes, soit un débit moyen de 957437.836517731 octets/seconde.
17.253.55.203  : 246865654 octets envoyés, en 619.1051 secondes, soit un débit moyen de 398745.954442953 octets/seconde.
172.16.1.63    : 239030327 octets envoyés, en 258208.8964 secondes, soit un débit moyen de 925.724598697445 octets/seconde.
184.25.107.81  : 214707248 octets envoyés, en 1373.6761 secondes, soit un débit moyen de 156301.218314856 octets/seconde.
92.103.108.91  : 207925990 octets envoyés, en 2219.8763 secondes, soit un débit moyen de 93665.5749691999 octets/seconde.

However, as you must surely have noticed, an intruder seems to not have been sorted properly: IP 172.16.1.1, which should have been at the very top of the list, and... well, is not.

Now, I've added debug displays to the sort block, and they seem to give correct results (that is, giving me a positive number if the "gluttonest" IP is $b, and a negative number if it is $a).

So, I wonder... could it be that I've hit some kind of "ceiling" above which the sort function won't work as intended ? I'd much rather use it than code a sort function myself, but, if it's not reliable, I might have to come to that...

Could you maybe help the humble maggot that I am to come one step closer to becoming a nice Perl-Guru-butterfly ?

(I have to mention that I'm working on a huge lot of data, and that I have a nearly-correctly-ordered list in exit, with that IP being the only one badly ordered - I've omitted several hundred IPs (and associated lines) after the ones I displayed)

Replies are listed 'Best First'.
Re: Problem with a sort result
by salva (Canon) on Jan 15, 2016 at 08:55 UTC
    You are running the script on a 32bit machine, right?

    There is a bug on the perl interpreter. Internally, it expects the result of the comparison operator to fit in an integer, but in that case, it doesn't. It is represented as an NV (a double float) that at some point is forcedly coerced into an integer and that operation overflows returning a bad result.

    As a workaround, just use the comparison operator (<=>) which knows how to handle any Perl internal representation for numbers correctly.

      Precious as always salva!
      Notice that two of mines 32bit versions are compiled (thanks syphilis as i understand is the patcher) using config_H.gc_64int and strawberry release notes says "32bit Strawberry Perl is compiled with USE_64_BIT_INT enabled but there exists a version without USE_64_BIT_INT"

      # testing the following oneliner perl -MConfig -E "say for $Config{archname},$Config{version}, sort {$a + - $b} qw(10000000000 3 2)" | MSWin32-x86-multi-thread | 5.14.2 | 10000000000 | 2 | 3 [OK] strawberry\perl\bin\perl.exe | MSWin32-x86-multi-thread-64int | 5.20.0 | 2 | 3 | 10000000000 [OK] straw5.20-32b\perl\bin\perl.exe | MSWin32-x64-multi-thread | 5.16.2 | 2 | 3 | 10000000000 [OK] straw64\perl\bin\perl.exe | MSWin32-x86-multi-thread-64int | 5.22.0 | 2 | 3 | 10000000000 [OK] strP5.22-32\perl\bin\perl.exe

      L*
      There are no rules, there are no thumbs..
      Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.

      Nope, Sir. I do run it on a 64bit server (linux), and the Perl I use on it shows the following first line when using '-v':

      This is perl 5, version 16, subversion 0 (v5.16.0) built for x86_64-linux-thread-multi

      Also, I think that my "-" implementation should have been within the limits of the "sort" specifications (by the way, I don't know who I should contact about it, but it is somewhat sketchy. I mean, telling us we should provide a negative, null, or positive integer is all well and good, but telling us WHAT IT MEANS (like "a null value means that both elements have the same rank", and so on) would have been a huge improvement.

      Anyway, yes, it now works correctly when using <=>, so I'm guessing that, despite the version being a 64b Perl, there is indeed a problem with the huge negative numbers I got (possibly some kind of negative overflow, or merely a bug where the function doesn't expect as much data, and "loses" important parts, like the first characters (including the sign). I really don't know.

        Hey, there are actually 2 bugs on perl!!!

        The first is the one I described before where any number returned by the comparison subroutine is coerced into an IV, but then, there is a C function that just wraps that sub and which has a return type of I32, so the result is coerced again into a 32bit number.

Re: Problem with a sort result
by 1nickt (Canon) on Jan 14, 2016 at 17:42 UTC

    Hello kzwiz,

    Personally I've never seen `-` used as the comparison operator in a custom sort. This doesn't mean anything other than I've never seen it, but I do find the standard numerical comparison operator (`<=>`) provides your expected results:

    $ perl -E' $h={foo=>{bar=>13908480193},baz=>{bar=>771918754},qux=>{bar=>288170793 +399}}; say "$_: $h->{$_}->{bar}" for sort { $h->{$b}->{bar} - $h->{$a}->{bar} + } keys %$h; ' foo: 13908480193 baz: 771918754 qux: 288170793399
    $ perl -E' $h={foo=>{bar=>13908480193},baz=>{bar=>771918754},qux=>{bar=>288170793 +399}}; say "$_: $h->{$_}->{bar}" for sort { $h->{$b}->{bar} <=> $h->{$a}->{ba +r} } keys %$h; ' qux: 288170793399 foo: 13908480193 baz: 771918754
    Hope this helps!


    The way forward always starts with a minimal test.
      in fact, Perl does what we tell to do, and is difficult to spot some valid Perl code that was not what we intended,but it still compile.
      The tricky part is that, contrarly to what i remembered, the sort sub must not returns -1 or 0 or +1, but:returns an integer less than, equal to, or greater than 0 so all the subtractions of octets are valid return value when choosing if $a or $b must be returned.

      L*
      There are no rules, there are no thumbs..
      Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
        The danger is there might be some kind of magic at work with $a, $b and the comparison operators that gets disrupted when going off book. In the case of the each function that is certainly the case.

        One world, one people

Re: Problem with a sort result
by uri (Acolyte) on Jan 15, 2016 at 12:08 UTC
    the issue is with signed vs unsigned. as the other comment said, using - instead of <=> is the main bug. this is not a perl bug. but if you do need speed, check out Sort::Maker as it will do that sort faster than your code as it will factor out all those hash lookups before the sort is done.

      Allow me to disagree.

      I mean, obviously, it's not "breaking specifications", as there does NOT seem to be available specifications (because the documentation does NOT seem to mention how "sort" orders elements), but still, it furiously looks like an unwanted behaviour, or, at the very least, a caveat worth mentioning.

      I'm guessing that adding a single paragraph to the documentation, telling that a positive result orders in the following way, a negative result orders in the other way, and a null result keeps the same order, and that it is STRONGLY RECOMMENDED to use the binary comparison operator for this, to avoid potential problems, would do the trick.

      Because, I'll be blunt, but I had no recollection whatsoever of having previously worked with the "comparison" operator. And if it's not explained why one should use it, then... why not use something more familiar, and doing the same thing ? I mean, this is Perl, and "There is more than one way to do it", right ?

        Perl often defers to C, or underlying platform, when it comes to specifications. (Sort of a hand-wave.) This seems to be the case here, as well. Read the page on qsort and compare to sort. Here's the pertinent quote:

        If SUBNAME is specified, it gives the name of a subroutine that returns an integer less than, equal to, or greater than 0, depending on how the elements of the list are to be ordered.

        So the behaviour is documented, and there is a discrepency in perl. You could try to fix the perl sources to address this issue. (But be sure then to run all tests. I've no clue why it's coded that way; there might be a reason.)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1152791]
Approved by Discipulus
Front-paged by Old_Gray_Bear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (3)
As of 2024-04-20 01:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found