Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

perl string pass by value

by pwagyi (Monk)
on Apr 02, 2018 at 14:03 UTC ( [id://1212165] : perlquestion . print w/replies, xml ) Need Help??

pwagyi has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks!

I am asking this question out of curiosity. Since Perl string is value type, does passing long string as subroutine argument incur performance penalty? I searched around and found that since perl 5.20 (http://perldoc.perl.org/5.20.0/perldelta.html#Performance-Enhancements), Does that mean perl <5.20 will suffer if long strings are passed around? (especially argument is used as read-only)

Perl has a new copy-on-write mechanism that avoids the need to copy the internal string buffer when assigning from one scalar to another. This makes copying large strings appear much faster. Modifying one of the two (or more) strings after an assignment will force a copy internally. This makes it unnecessary to pass strings by reference for efficiency.

# perl string is value type (not reference) my $s = "foo"; my $t = $s; $s .= "bar"; # $t is still 'foo' since assignment copy the value.

Replies are listed 'Best First'.
Re: perl string pass by value
by davido (Cardinal) on Apr 02, 2018 at 14:44 UTC

    It's worth noting that the special variable, "@_" contains aliases to the calling args, not copies of them. Therefore, there is no copy being made here:

    sub foo { return length($_[0]); }

    ...because we're acting only on the aliased entity. Similarly, there should be no copy here (assuming the s/// operator doesn't make one):

    sub dot_to_underscore { $_[0] =~ s/\./_/g; } my $string = 'hello.world'; print "$string\n"; dot_to_underscore($string); print "$string\n";

    No copy because we acted upon the alised entity, which is the same as acting upon the entity itself. On the other hand, had we unpacked our args by assigning $string to some other variable lexically scoped to the subroutine, done a substitution (probably triggering copy on write), and then returned the string, we have the opportunity to have created at very least one copy.

    Copying small strings and simple values such as integers, and even floating point numbers is pretty fast, and for general cases we probably shouldn't care. But for long strings I can see where it would be useful to be aware of what will and will not trigger a buffer copy.

    On the other hand, consider this:

    sub dot_to_underscore { my $string = $_[0]; # No copy on 5.20+, but yes, copy under <5.20 +. $string =~ s/\./_/; # Copy made under 5.20 because of copy-on-wri +te. return $string; # No copy under 5.20+ until the *caller* modi +fies the return string. But yes, the caller will get a copy pre-5.20. }

    Dave

      >  for general cases we probably shouldn't care

      IIRC COW (Copy On Write) is also relevant for forking/threading

      Cheers Rolf
      (addicted to the Perl Programming Language and ☆☆☆☆ :)
      Wikisyntax for the Monastery

      Good point on @_ is alias entity. Unfortunately, when sub argument is supposed to be hash, example in a typical subroutine call like this, callee will *unpack* the arguments, and copying is un-avoidable. I have millions of such kind of calls. I've yet to benchmark perl <5.20 and 5.20+. If someone had already made a study on this, it'd be good to know if there's a big performance difference.

      Foo->new( arg1=> $a_very_long_string, arg2 => $float, arg3=> $another_ +long_string); # in new sub new { my ($class,%args) = @_; }

        Yes, that was an unfortunate design decision when this could have avoided the issue:

        sub new { my ($class, $args_href) = @_; .... }

        Dave

Re: perl string pass by value
by ikegami (Patriarch) on Apr 02, 2018 at 16:55 UTC

    Perl always passes by reference (not by value), so calling a sub doesn't copy the arguments.

    However, the standard practice is to copy the arguments into local variables (e.g. my ($x, $y) = @_;), effectively getting copy-by-value semantics.

    Since 5.20, Perl uses a copy-on-write mechanism that avoids actually copying the string until required (by the string being modified), so that copy is cheap.

    sub foo { my ($s) = @_; # String copied here before 5.20 $s =~ s/.//s; # String copied here since 5.20 } foo($str); # No copying here.

    So,

    • It's fine to pass long strings to subs that don't modify it or copy it.
    • It's fine to pass long strings to subs that don't modify it or copies of it, as long as you have Perl 5.20+
Re: perl string pass by value
by LanX (Saint) on Apr 02, 2018 at 14:29 UTC
    Please use [brackets around links] to make them clickable

    I don't understand your question, http://perldoc.perl.org/5.20.0/perldelta.html#Performance-Enhancements and the passage you cite are pretty clear.

    In some language like JS strings are immutable°, which means you can optimize the assignment by simply sharing the (internal) reference, only if one of the copies is changed you'll need to allocate new space for the cloned and changed content.˛

    This has performance advantages if copies are rarely changed, and strings can become very long in Perl.

    Perl has mutable strings, the reference of a scalar will point to the changed container.

    NB: JS has no (external) references of so called "primitive types" (read scalars), i.e. no reference \$var operator at all. And JS doesn't have any "aliasing", where manipulating $_[0] will change the passed arguments of a sub.

    Cheers Rolf
    (addicted to the Perl Programming Language and ☆☆☆☆ :)
    Wikisyntax for the Monastery

    UPDATES

    °) https://developer.mozilla.org/en-US/docs/Web/JavaScript/Data_structures

    Unlike in languages like C, JavaScript strings are immutable. This means that once a string is created, it is not possible to modify it. However, it is still possible to create another string based on an operation on the original string.

    ˛)

    your example in JS:

    var s = "foo"; var t = s; s += "bar";

    means a new string s + "bar" constructed, allocated in memory and linked to the symbol s.

    But in older Perl Versions $s and $t point to different memory locations, and the $s.="bar" means 3 letters are appended to the buffer holding the characters of $s. (if there is buffer space left)

Re: perl string pass by value
by Anonymous Monk on Apr 02, 2018 at 16:45 UTC
    Remember that Perl was originally architected at a time when memory was at an extreme premium and CPUs were very slow. The internal handling of strings has changed to fit the times but plenty of old source code reflects what used to be.
      > Remember that Perl was originally architected at a time when memory was at an extreme premium and CPUs were very slow.

      How is unnecessary copying saving memory?

      IMHO Perl just reflected the behaviour of C.

      Cheers Rolf
      (addicted to the Perl Programming Language and ☆☆☆☆ :)
      Wikisyntax for the Monastery

        And yet, today's runtime Perl implementations must somehow strive to accommodate all of this "old" source-code, as graciously as it possibly can ... without breaking any of its now-oh-so-old assumptions . . .

      Without some kind of citation or description of what you describe one must assume you're talking through your hat.