Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number

Sort and related stuff

by leons (Pilgrim)
on Feb 23, 2001 at 15:06 UTC ( [id://60460] : perlquestion . print w/replies, xml ) Need Help??

leons has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

I was kind of puzzled when I found out the following. It is
probably perfectly logical, however ... it puzzled me.

I have the following piece of code (thanks ChOas) which sorts
version-numbers (1.0.a, 1.0.10, 5.5.001, et cetera).

Now I decided that I also needed a similar function that I could
use to compare version-numbers. (Returning -1 when the left
parameter is smaller than the right, returning a 0 when they're
equal and returning a 1 when the left is greater than the right.
This is exactly the same function, not taking the fact that I need
to pass two parameters to it, into account. So I decided to try and
integrate them into one subroutine ...

#!/usr/bin/perl -w use strict; sub byVersion; my ($v1,$v2)=("",""); print "$v1 is equal to $v2\n" if (byVersion($v1,$v2)== 0); print "$v1 is greater than $v2\n" if (byVersion($v1,$v2)== 1); print "$v1 is less than $v2\n" if (byVersion($v1,$v2)==-1); sub byVersion { my @First=split /\./,$a||shift; my @Second=split /\./,$b||shift; foreach(0..(($#First>$#Second)?$#Second:$#First)) { next if ($First[$_] eq $Second[$_]); return (($First[$_]=~/^\d+$/o)&&($Second[$_]=~/^\d+$/o))?($First[$_] +<=>$Second[$_]):($First[$_] cmp $Second[$_]); } return ($#First<=>$#Second); }

Doing this results in the following error:

srmpilot@IBP-test$ ./sort.s
Name "main::a" used only once: possible typo at ./sort.s line 20.
Name "main::b" used only once: possible typo at ./sort.s line 21. is less than

Which ofcourse is very understandable. I do not use the $a en $b
variables (normally passed on by sort in this example ... I would, however,
in an example in which I needed to actually sort a bunch of versions (see below).

#!/usr/bin/perl -w use strict; sub byVersion; my ($v1,$v2)=("",""); #----------------------------- my @row=(1,2,3,4); my @nrow=sort byVersion @row; print "@nrow\n"; #----------------------------- print "$v1 is equal to $v2\n" if (byVersion($v1,$v2)== 0); print "$v1 is greater than $v2\n" if (byVersion($v1,$v2)== 1); print "$v1 is less than $v2\n" if (byVersion($v1,$v2)==-1); sub byVersion { my @First=split /\./,$a||shift; my @Second=split /\./,$b||shift; foreach(0..(($#First>$#Second)?$#Second:$#First)) { next if ($First[$_] eq $Second[$_]); return (($First[$_]=~/^\d+$/o)&&($Second[$_]=~/^\d+$/o))?($First[$_] +<=>$Second[$_]):($First[$_] cmp $Second[$_]); } return ($#First<=>$#Second); }

Running this results in:

srmpilot@IBP-test$ ./sort.s
1 2 3 4 is less than

I DO understand the fact that he sorted the row correctly.
Because he used the $a and $b variables ... so, no
error-message appeared. However, I do NOT understand the fact that
the Name "main::a" used only once: possible typo error
didn't occur the second time, while comparing the $v1 and $v2
variables. Because the second time it goes into the byVersion sub
the $a and $b aren't used (as in the above example)

So a)... what happened to the error message ? and b) Is there an easier
way to use byVersion for the two above mentioned (2) purposes ?

I thought of doing something like:

my ($val1,$val2)=("1.1.2","1.1.3"); print "Greater than\n" if ((sort byVersion ($val1,$val2))[-1] eq $val1 +);

Which works o.k. for this example, however it would give problems
for Smaller than and especially for equals.

Suggestions anyone ? Your help is very much appreciated !

Thanks and Bye, Leon

Replies are listed 'Best First'.
Re: Sort and related stuff
by Corion (Patriarch) on Feb 23, 2001 at 15:44 UTC

    Disclaimer: I have not looked into the Perl internals of sorting, so this is based just on my naive observations.

    The method of using $test = $a || shift is an interesting way to implement a dual-use function, but you get bitten by Perl magic as you see.

    In the first case, Perl rightfully complains about the variables $a and $b only being used once, since that is what you effectively do.

    In the second case, deep Perl (optimization) magic is at work, since the variables $a and $b get filled with sort parameters for speed reasons and thus get "used" as far as Perl (or rather, the warnings-part of perl) is concerned. So, for sorting, $a and $b have been declared globally by the sort magic, and then warnings finds them already used (and maybe even declared) and thus shuts up.

    I don't see any good solution to your dilemma, as the two or three approaches I see only lead down to beaten but bad paths :

    • Make a third routine, called by the two routines : Bad idea for performance reasons (if that's a reason). At least, performance is the reason why Perl uses the magic variables $a and $b instead of simply passing parameters around. If performance is of no concern, use this way.
    • Make two routines, one to be called directly, and one to be called by sort. Even worse idea. Code duplication leads the way to hell and very subtle errors. If you have two routines seemingly intended to do the same, you should make sure they actually do the same.
    • Predeclare $a and $b with my. Interesting idea - you should comment on _why_ you're predeclaring $a and $b. This seemed like a good idea, until I tested this variant and found a very interesting way to create hellish errors, illustrated below.

    #!/usr/bin/perl -w use strict; my ($a, $b); # Bad bad bad thing ! ($a, $b) = ( "5","6" ); sub mySort { return ($a||shift) cmp ($b||shift); }; print "\n1: ", mySort( "1","2" ); print "\n2: ", join " ", sort qw!d b c a!; print "\n3: ", mySort( "8","7" ), "\n"; print "\n\$a: $a, \$b : $b\n";
    This code prints, when run
    G:\>perl -w 1: -1 2: a b c d 3: -1 $a: 5, $b : 6
    Here, the cases 1 and 2 are pretty sane, and as you see, even the my-declared variables $a and $b didn't get clobbered. But (and there's always a "but") case 3 is a very nice case of how to hide errors in interesting and nonobvious ways. If not called explicitly via sort, the routine now accesses the my-declared variables $a and $b, which have been initialized (by a stupid user understanding none of the code) to (5,6), which messes up all subsequent compares. So this is a really bad idea, because you can't make sure that noone initializes your "protectively" predeclared variables...

    I'm not sure that there is no perlish idiom for handling such cases, but I haven't seen them yet :-)


      Thanks a lot for your help ! I just tried the solution Tyke
      came up with. And it seems like a pretty good one (use vars qw/$a $b/;)
      I tried it and it works perfectly and I haven't been able to
      create a situation in which it doesn't work. So I think I'll go for
      this one.

      Thank you all for thinking along with me ! ;-)

      And now for some lunch ! Bye, Leon

        This solution has the same problems as my my-predeclare solution has. If somebody is stupid enough to assign something to $a and $b, then this method won't work for the single-call case. But on the other side, the single-call case will never work, if somebody actually uses variables called $a and $b in any way other than within a sort-callback.

Re: Sort and related stuff
by davorg (Chancellor) on Feb 23, 2001 at 15:25 UTC

    Sounds like a bug in your version of Perl - which version are you using? It's my understanding that $a and $b are explicitly excluded from things like the 'used only once' checks for precisely this reason.

    For the record, here on 5.005_02 it works as expected - without the error message.


    "Perl makes the fun jobs fun
    and the boring jobs bearable" - me

      Hmmm ... that's kind of weird.
      I use version 5.6.0 for sun4-solaris

      Thanks, Leon 'still puzzled' Spaans
Re: Sort and related stuff
by Tyke (Pilgrim) on Feb 23, 2001 at 15:29 UTC
    When sort calls 'byVersion' it sets up $a and $b as package global variables. So these variables are used more than once -- when they are being set and when they are being used.

    If you add the line

      use vars qw/$a $b/;
    you 'declare' these variables as main package globals and you should get rid of the warning.
Re: Sort and related stuff
by leons (Pilgrim) on Feb 23, 2001 at 15:25 UTC
    I just added:

    my @dummy=sort byVersion ();

    To the top of my first example and that actually 'solves'
    the problem. No error message appeared and the program actually

    ./ssrmpilot@IBP-test$ ./sort.s is less than

    I guess this means that when the compiler finds a certain
    subroutine in combination with sort somewhere in the
    code, it decides that it must be a 'sort'-function and will
    therefore be treated different than a regular subroutine ...

    .... I'm still puzzled .... ;-)
(ichimunki) re: Sort and related stuff
by ichimunki (Priest) on Feb 23, 2001 at 17:38 UTC
    Lots of good answers here already. Short form, $a and $b are only used once in the first example, which is why you're being warned. In the second example, sort is using them as well. No warning.

    For the record, I would strongly recommend not building a subroutine that relies on external data that may or may not exist correctly. In this case, if I do two sorts and then do a call to this sub with variables set by the first sort, your subroutine will use the $a and $b vars from the second sort rather than the arguments I sent the function.
Re: Sort and related stuff
by sierrathedog04 (Hermit) on Feb 23, 2001 at 17:37 UTC
    The code posted above includes the following lines:
    my @First=split /\./,$a||shift; my @Second=split /\./,$b||shift; foreach(0..(($#First>$#Second)?$#Second:$#First))

    Could anyone explain to me what the # in the variable name does?

    The Llama and Camel books describe how # can be used for comments, formats and the shebang execute command. But in this code # affixed to the name of an array appears to be neither a comment or a format or a shebang. # affixed to the start of an array name automagically become — well, I don't know what it becomes.

    Does anybody know?

      $#list_name returns the index number of the last element in @list_name. This will give you an off-by-one error if you try to use it to count the number of elements in an list (so use $count = @list_name for that). Also useful for erasing a list: $#list_name = -1;

      See perldoc perldata for more information.
Re: Sort and related stuff
by goldclaw (Scribe) on Feb 23, 2001 at 18:23 UTC
    Perhaps this could be a job for use overload. Use byVersion as the comparison operator, and create a simple function for the stringification operator. If your constructor simply blesses a reference to the string, the stringification would be something as simple as:
    my to_string{ my $s=shift; return $$s; }
    I've just done something similiar with the versions names we use for our software releases. They are a bit weirder, but it works great...


Re: Sort and related stuff
by turnstep (Parson) on Feb 24, 2001 at 06:00 UTC

    This has been covered before. Here is the solution I came up with:

    @sorted = map { $_->[0] } sort { $x=1; while (defined $a->[1][$x]) { defined $b->[1][$x] or return -1; if ($x%2) { ## Strict numeric comparison return +1 if $a->[1][$x] > $b->[1][$x]; return -1 if $a->[1][$x] < $b->[1][$x]; } else { ## Non-numeric comparison return +1 if $a->[1][$x] gt $b->[1][$x]; return -1 if $a->[1][$x] lt $b->[1][$x]; } $x++; } return defined $b->[1][$x] ? 1 : 0; } map { [$_, [split(/(\d+)/, $_)]] } @unsorted;
Re: Sort and related stuff
by princepawn (Parson) on Feb 24, 2001 at 01:26 UTC