http://qs321.pair.com?node_id=575918

Zadeh has asked for the wisdom of the Perl Monks concerning the following question:

Hi, When writing subroutines, I notice in the docs that usually a "my $var = shift;" idiom is used if the sub only expects one input, and a "my ($v1,$v2,$v3) = @_;" kind of idiom for multiple inputs. Why use shift, though? Or, put another way, why not always use @_?

Replies are listed 'Best First'.
Re: shift vs @_
by Fletch (Bishop) on Oct 02, 2006 at 18:24 UTC

    Well, if you mean why not use things from @_ directly: items in @_ are aliases to the things passed in; if you're not careful you can unintentionally modify something of your caller's.

    my $foo = "bar"; sub bad_example { $_[0] =~ s/a/oo/; $_[0] } print "\$foo: $foo\n"; print "bad_example returns: ", bad_example( $foo ), "\n"; print "uh oh, \$foo: $foo\n";

    Now if you just mean why not always use the my( $a, $b, $c ) = @_; form vice shift: there's times when you want to pull off some items and then do something list-y with the remaining contents of @_. Best example off the top of my head would be something that builds a hash from key/value pairs like:

    sub take_one_and_hash { my $one = shift; my %hash = @_; ## frobulate $one based on %hash . . . }

    Unless you pull that first argument off with shift you'd have to do something like @_[1..$#_] which just looks crufty.

      Now if you just mean why not always use the my( $a, $b, $c ) = @_; form vice shift: there's times when you want to pull off some items and then do something list-y with the remaining contents of @_. Best example off the top of my head would be something that builds a hash from key/value pairs like:

      Another good example is subclassing...

      sub do_something { my $self = shift; $self->SUPER::do_something( @_ ); # do some more stuff }

      We're not surrounded, we're in a target-rich environment!

      I don't usually use shift even if I want to do something with the rest of the arguments. I do, like, my($one, %hash) = @_;.

Re: shift vs @_
by grep (Monsignor) on Oct 02, 2006 at 18:36 UTC
    I will often use multiple shifts if I want to short circuit a sub when a non-true value is passed.
    sub foo { my $bar = shift || return; my $baz = shift || return '<p style="font-color:red">BAD</p>'; }

    Again I'll note that this works only for non-true values, IOW if 0(zero) is acceptable then don't do this.



    grep
    Mynd you, mønk bites Kan be pretti nasti...

      if 0(zero) is acceptable then don't do this.

      More specifically, if one of the following is acceptable, then don't do this:

      • 0 (numeical zero),
      • '0' (the string consisting exactly of the character zero),
      • '' (the zero-length string),
      • undef (the undefined value), or
      • something that evaluates to one of the above in boolean context
      'careful, perl's true may not coincide with real true. "0" and 0 are equivs.
Re: shift vs @_
by hgolden (Pilgrim) on Oct 02, 2006 at 18:22 UTC
    To put it one way, they are always using @_. If you don't give an array to shift, it uses @_.

    I think the "shift" idiom caught on out of laziness, though it's not as slick with more than one argument. You can write:

    my $var1 = shift; my $var2 = shift;...
    Likewise, you can write:
    my ($var)=@_;
    for only one argument.

    Hays

Re: shift vs @_
by sgifford (Prior) on Oct 02, 2006 at 19:18 UTC
    As others have said, mostly this is a matter of personal style. Sometimes it can make a difference, though, since shift actually modifies @_.

    For example, this code could allow a function to act as a regular sub, or as a method (assuming the first argument will never be a reference unless it's called as a method):

    my $self = ref $_[0] ? shift : undef; my($arg1,$arg2,$arg3)=@_;

    Similarly, if you have a sub that takes a list as its final argument, shifting off the non-list arguments can make it clearer what you're doing, especially if you pass the list on to any other subs:

    sub my_sort { my $sortname = shift; return sort $SORTS{$sortname} @_; }

    This can also be useful if you decide to goto another sub, since that sub will see the modified @_ as its argument list.

    Finally, as a matter of style, I usually get $self or $class with shift in methods and constructors, since it makes it clearer what the explicit arguments are, as compared to the implicit object reference or class name passed in as the first argument.

    Update: Corrections and clarifications from jimt.

      my $self = shift if (ref $_[0]); my($arg1,$arg2,$arg3)=@_;

      Eeek! This sort of thing can rapidly get you into serious trouble. First of all, there's the fact that this won't operate properly if it's called as a class method such as Some::Package->foo( qw(arg1 arg2 arg3) ); $_[0] is a string, in that case.

      Secondly, it breaks down if you pass in a normal reference as your first argument. foo({'hashkey' => 'hashval'}, qw(arg4 arg5)); You end up setting a normal arrayref as $self.

      And thirdly, and potentially most importantly, if the conditional fails, then the variable maintains its scope. There are other threads on the board talking about the pitfalls of constructs like my $xyz if 0, which is what can happen here. The short answer is in this case $xyz would end up acting like a static variable. The first time, it gets initialized to something random (well, not random - it's 0 (or undef?), but that's not guaranteed and could change at any time (not that it has) ), and on subsequent calls, it maintains its value.

      It doesn't directly cause problems here, probably, but observe this contrived function.

      sub foo { my $self = shift if (ref $_[0]); $self++ unless ref $self; my($arg1,$arg2,$arg3)=@_; print "($self)($arg1)($arg2)($arg3)\n"; } foo(qw(a1 a2 a3)); foo(qw(a3 a4 a5)); Output: (1)(a1)(a2)(a3) (2)(a3)(a4)(a5)

      Note how $self survived between calls and was around to increment. Bugs of this nature can be horrible to track down. It's also frowned upon to try to use this for static variables - use a closure or wrap the subroutine in an additional lexical scope to have "static" variables scoped to the subroutine. Besides, it's tougher to read this way. In general, it's best to always avoid my $foo if $something constructs, except for obfuscated contests or the like.

        Thanks jimt, I've updated that example to use:
        my $self = ref $_[0] ? shift : undef;
        which doesn't seem to suffer from the largest of the problems. I've also added some clarifications about when it will and won't work.

        As an aside, you would have a very hard time convincing me that this behavior of my and conditionals is anything but a bug in Perl. :)

        First of all, there's the fact that this won't operate properly if it's called as a class method such as Some::Package->foo( qw(arg1 arg2 arg3) ); $_[0] is a string, in that case.

        Secondly, it breaks down if you pass in a normal reference as your first argument. foo({'hashkey' => 'hashval'}, qw(arg4 arg5)); You end up setting a normal arrayref as $self.

        I can get around some of those problems. First, if we assume that the subroutine was written as a function originally, and had no concept of self:

            shift if UNIVERSAL::isa ($_[0], __PACKAGE__);

        We can also deal with the possibility that we need self for the class name (eg, incase soemone calls it as a method to override inherited routines, but there's still existing code that has code that assumes it's a function:

        my $self = __PACKAGE__; $self = shift if UNIVERSAL::isa ($_[0], __PACKAGE__);

        It completely handles your first issue, however, this will break if you have a method that takes as its first argument an item of its same type (and someone tries calling it as a function), or if someone does some sort of multiple inheritance, that results in the first argument inheriting from this class (and then calls it as a function). It also adds an additional problem case where a function argument that's a string containing the name of a package that inherits the package in question is assumed to be the 'Class->method()' syntax..

      Besides style, is there any performance penalty paid by shift? In the docs I see: "Shifts the first value of the array off and returns it, shortening the array by 1 and moving everything down." My reading of this makes me think that shift is potentially an expensive operation, because it has to "shift" the front of the area off, and then copy the remaining elements all back one position.

        This has been covered a few times. Check out:

        The bottom line? Perl implements arrays by creating a block of memory, and then pointing the beginning of the array as an offset into that block. When you try to unshift too much onto the beginning of the array such that we run out of room at that end of the chunk of memory, perl goes to allocate more memory, and, again, keeps a chunk free at the beginning. However, if you're shifting off the array until it's empty, perl just keeps incrementing the "beginning" pointer until the beginning and end point at the same place, meaning a length of zero. There is no copying here whatsoever.

        They seem to be about the same. shift comes out slightly ahead in this test, perhaps because it simplifies the loop.
        #!/usr/bin/perl use warnings; use strict; use Benchmark; timethese(1_000_000, { 'use_shift' => sub { sub_with_shift(0..9) }, 'use_list' => sub { sub_with_list(0..9) }, 'use_direct' => sub { sub_with_direct(0..9) }, }); sub sub_with_shift { my $sum = 0; while (@_) { $sum += shift; } $sum; } sub sub_with_list { my(@a)=@_; my $sum = 0; $sum += $_ for @a; $sum; } sub sub_with_direct { my $sum = 0; $sum += $_ for @_; $sum; }
        Benchmark: timing 1000000 iterations of use_direct, use_list, use_shift...
        use_direct:  7 wallclock secs ( 6.48 usr + -0.01 sys =  6.47 CPU) @ 154559.51/s (n=1000000)
          use_list: 10 wallclock secs ( 9.85 usr +  0.06 sys =  9.91 CPU) @ 100908.17/s (n=1000000)
         use_shift:  6 wallclock secs ( 6.48 usr +  0.01 sys =  6.49 CPU) @ 154083.20/s (n=1000000)
        
        Besides style, is there any performance penalty paid by shift?

        Yes; if you do it repeatedly hundreds of thousands of times in a tight loop, you might slow down your program as much as performing one IO operation. In other words, none that you will ever notice.

        Interesting point, but it would be a lot easier to shift the zeroth index point up one element, presuming that the array members are themselves string descriptors or other pointers. That plus decrementing the array size would be pretty cheap.
Re: shift vs @_
by shmem (Chancellor) on Oct 02, 2006 at 19:33 UTC
    Arguments are passed into a subroutine via @_. Accessing one element or more of that array can be done in two ways, non-destructive (the @_ array persists as passed in) or destructive (the @_ array is consumed during assignment):
    # 1. non-destructive my $var = $_[0]; # first element of @_ my @list = @_; # complete @_ # 2. destructive my $var = shift; # first element gets removed + from @_ my ($foo, $bar) = map { shift } 1,2; # two elements get removed f +rom @_

    If @_ needs to be intact for subsequent calls, as in

    sub foo { my $foo = $_[0]; my $quux = baz(@_) return $foo ^= $quux; }

    the non-destructive methods are used. Also, as the variables passed in via the vector @_ are references aliases to the thingies the caller provided, the non-destructive way is often used to do in-place transformations.

    Otherwise, it just doesn't matter. Since after setting up the variables in the sub's scope @_ isn't looked at anymore, arguments may be shifted or not. In these cases saying $var = shift or $var = $_[0] does the same for the sub, although the impact on @_ is different.

    So, saying $var = shift and @list = @_ is just caring about copying, but done that, not caring about @_ any more.

    Using the arguments in a sub without prior assignment (i.e. without copying, as $_[0] .. $_[$#_]) modifies the thingies in the caller.

    --shmem

    <update> changed reference to alias as per tye's post </update>

    _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                  /\_¯/(q    /
    ----------------------------  \__(m.====·.(_("always off the crowd"))."·
    ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
      Assume I have some large strings. I don't want them copied when I send them as parameters to subroutines/methods (which doesn't modify them).

      Do I have to use the $_[x] notation for that? Should I send them in as references?

        Both ways are ok. $_[0] is actually a reference (thanks, tye!) an alias to the first argument passed by the caller. You can evaluate (or operate on) $_[0] directly. Passing arguments as references just adds another level of indirection.

        You can also use prototypes for your subs and access your arguments inside the sub as references:

        #!/usr/bin/perl sub foo(\@\$) { warn "foo args: (".join(",",map{"'$_'"}@_).")\n"; print "1st argument = $_[0]; content =(" . join(',',map{"'$_'"} @{$_[0]}).")\n"; print "2nd argument = $_[1]; content = '".${$_[1]}."'\n"; my ($array,$scalar) = @_; push @$array, $$scalar; $$scalar = "blurf"; } my @ary = qw(foo bar baz); my $foo = "blah"; foo(@ary,$foo); print "ary: (@ary)\n"; print "foo: $foo\n";

        output:

        foo args: ('ARRAY(0x8167870)','SCALAR(0x8167978)') 1st argument = ARRAY(0x8167870); content =('foo','bar','baz') 2nd argument = SCALAR(0x8167978); content = 'blah' ary: (foo bar baz blah) foo: blurf

        --shmem

        _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                      /\_¯/(q    /
        ----------------------------  \__(m.====·.(_("always off the crowd"))."·
        ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
Re: shift vs @_
by tomhukins (Curate) on Oct 02, 2006 at 20:11 UTC

    I can think of one reason not yet mentioned: Multi-line shift makes patches more readable. Imagine you have:

    sub blah { my $foo = shift; my $bar = shift; }

    If I want to add another argument (maybe at this point I should consider hash-based arguments as above, but let's assume I can't, maybe for backwards compatibility), I might rewrite this as:

    sub blah { my $foo = shift; my $bar = shift; my $baz = shift; }

    The patch to add this argument only contains one additional line with the new argument, making the code easier for people reviewing changes in your version control system or reviewing patches before committing them.

    If you change my ($foo, $bar) = @_; to my ($foo, $bar, $baz) = @_; traditional patch/diff output shows one line removed and one added which makes it a little less clear that the change does nothing other than add one new argument.

      I'm not going to claim this is better but it does allow you to meet your change management goals while still using the my()=@_ form.
      sub blah { my( $foo, $bar, $baz, ) = @_; }
        Actually, I think I will claim it is better. In theory it should be easier to notice changes since there is less clutter -- no leading "my" and no trailing "= shift" that someone eyeballing the code has to filter out so the variables themselves should stand out better.

      That depends on the diff tool you use. TortoiseSVN's diff shows changes within a line for example, so the extra verbiage associated with the shift is not required. Besides, generally the number of times you have to check deltas to solve a problem are many fewer than the number of times you look at the code in the process of writing or using it. An idiom that provides clarity in writing and use is more important than occasional merge or delta issues.

      Using the list assignment form makes it easier to match the parameter list in the call with the required parameters - the assignment list self documents the required parameters (especially if good names are used).


      DWIM is Perl's answer to Gödel
Re: shift vs @_
by eyepopslikeamosquito (Archbishop) on Oct 02, 2006 at 21:51 UTC

    This question is discussed at length in the excellent Perl Best Practices book in the third item of Chapter 9, "Argument Lists: Always unpack @_ first". Notice that this chapter is available free online as the book's sample chapter.

    In this item, TheDamian argues that either version is acceptable, with the shift-based version preferred when one or more arguments needs to be sanity checked or documented with a trailing comment. The most important thing though, is to unpack them at the start of the subroutine and to avoid accessing them directly as $_[0], $_[1] etc.

      just to prevent a perception of your statement as dogmatic -
      The most important thing though, is to unpack them at the start of the subroutine and to avoid accessing them directly as $_[0], $_[1] etc.

      - in most cases, yes. Operating directly on $_[0] is always fine if you know what you are doing, and why, e.g. the sub and the caller are designed that way to avoid costly copying. Whether in such cases a reference should be passed in the first place is another story. But then, $_[0] is a reference already...

      --shmem

      _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                    /\_¯/(q    /
      ----------------------------  \__(m.====·.(_("always off the crowd"))."·
      ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
Re: shift vs @_
by Anonymous Monk on Oct 03, 2006 at 08:27 UTC
    Suppose you'd write a function like map, its first argument plays a very different role than the rest - which can be any length. Wouldn't it make sense to write:
    sub my_map { my $special_thingy = shift; foreach my $item (@_) { apply($special_thingy, $item); } }
    Starting with
    my ($special_thingy, @list_of_items) = @_;
    is a potentially costly operation, since you are copying everything else in @_ - and it is slightly different as you lose the aliasing. And writing:
    sub my_map { foreach my $item (@_[1 ..$#_]) { apply($_[0], $item); } }
    or
    sub my_map { for (my $i = 1; $i < @_; $i++) { apply($_[0], $_[$i]); } }
    is not something I fancy.
Re: shift vs @_
by exussum0 (Vicar) on Oct 03, 2006 at 12:24 UTC
    My favourite reason to use shift, that's not necessarily right, but right for me? I can't easily do my $x, $y = @_;

    It's valid perl. It assigns a scalar. You will likely run into the bug right away, but who knows when, eh?

    I can't do the equiv w/ shift in the form of...

    my $x = shift; my $y = shift;
    Unless i do..
    my $x = $_[0]; my $y = $_[1];
    Now I have to worry about indicies. At least shift requires none.

      I can't easily do my $x, $y = @_; It's valid perl. It assigns a scalar.

      Only because the way you called my creates a scalar context. Try my( $x, $y ) = @_;, which creates a list context, and so produces the same result as my $x = $_[0]; my $y = $_[1];.

      This works with any array or list. Check out the difference between:

      $x, $y, $z = (2,3,4); # $x = undef, $y = undef, $z = 4 ($x, $y, $z) = (2,3,4); # $x = 2, $y = 3, $z = 4

      See, a list in scalar context will give its last element. An array in scalar context will give the number of its elements. However, if we make both sides of the assignment have list context, then elements get copied.

      <radiant.matrix>
      A collection of thoughts and links from the minds of geeks
      The Code that can be seen is not the true Code
      I haven't found a problem yet that can't be solved by a well-placed trebuchet
        Um, yeah. But it doesn't prevent me from being human and making a mistake. It's harder to get it wrong w/ a progression of shifts.

      What wrong with my $x, $y = @_? Putting perl specifics away (not worried about the syntax like whether you should say my (a,b)). More and more languages are support this kind of de-group operation.

        I personally believe that for once Cop's question is legitimate, but one cannot put "Perl specifics away" since this is Perl: while avoiding a pair of parentheses may represent an occasional advantage -I for one try to avoid them all the time, if possible- in this particular case wouldn't square at all with Perl's whole syntax and semantics. Anyway, as this very thread and tons of similar ones show, Perl's current argument passing mechanism is both fascinating for the far reaching consequences it gets out of an extreme simplicity on the one hand, and awkward on the other one. This is why after some initial period of being skeptical I now cherish Perl 6's message passing. For very simple stuff I can still rely on @_ and not worry at all. Suppose I want to write a sub that will take a list (of strings) and concat it after making all lowercase every element out of two starting with the first, and all uppercase the other ones; then I'd write it like this:

        pugs> sub foo { [~] map -> $a, $b { lc $a, uc $b }, @_ }; pugs> say foo <FoO bAr BaZ>; fooBARbaz

        But if I were to write a sub that would take two integers and return the product of integer numbers comprised between them, then I would write:

        pugs> sub prod (int $n, int $m) { [*] $n..$m }; pugs> say prod 3, 6; 360
        --
        If you can't understand the incipit, then please check the IPB Campaign.