lhs substr(): refs vs. scalars

renodino has asked for the wisdom of the Perl Monks concerning the following question:

This performance question has pestered me for awhile, so I'm hoping someone can shed light wo/ my having to do deep SV deconstruction:

I'm managing some large binary buffers (64K+, handled as scalars). I need to modify various pieces of the buffer in situ. Logically, using pack(), unpack(), and substr(), everything works fine.

But what of performance and passing the buffers around between methods/objects ? Since perl is pass-by-value, I assume my big buffer gets copied unless I explicitly pass it as a ref. But what happens with substr() (or other builtins) ? Are they optmized to use the underlying SV wo/ making a copy ? Obviously, lhs substr() modifies the original, but is it making a copy, modifying the copy, then replacing the SV's buffer, or does it truly work in situ ?

I've tried perusing the perl source, but there are so many substr's, I don't know which is which.

Update:

Many thanks to all the responders. Just to get a sense of the impact, I wrote a little (imprecise) test script:

use Time::HiRes qw(time);

my $buffer = "\0" x 65000;

my $start = time();

substr($buffer, 10, 4) = pack('L', $_)
    foreach (1..900000);

print "inline took ", (time() - $start), " secs\n";

$start = time();

bySVBuffer3Arg($buffer, $_)
    foreach (1..900000);

print "bySVBuffer3Arg() took ", (time() - $start), " secs\n";

$start = time();

bySVBuffer4Arg($buffer, $_)
    foreach (1..900000);

print "bySVBuffer4Arg() took ", (time() - $start), " secs\n";

$start = time();

byRefBuffer(\$buffer, \$_)
    foreach (1..900000);

print "byRefBuffer() took ", (time() - $start), " secs\n";

$start = time();

byCopyBuffer($buffer, $_)
    foreach (1..900000);

print "byCopyBuffer() took ", (time() - $start), " secs\n";


sub byCopyBuffer {
    my ($buf, $val) = @_;
    substr($buf, 10, 4) = pack('L', $val);
    return 1;
}

sub byRefBuffer {
    my ($buf, $val) = @_;
    substr($$buf, 10, 4, pack('L', $$val));
    return 1;
}

sub bySVBuffer3Arg {
    substr($_[0], 10, 4) = pack('L', $_[1]);
    return 1;
}

sub bySVBuffer4Arg {
    substr($_[0], 10, 4, pack('L', $_[1]));
    return 1;
}
[download]

and got these results (WinXP, 2.4GHz, AS 5.8.6):

C:\Perl>perl bufref.pl
inline took 0.919242858886719 secs
bySVBuffer3Arg() took 1.52185487747192 secs
bySVBuffer4Arg() took 1.23141598701477 secs
byRefBuffer() took 2.32297611236572 secs
byCopyBuffer() took 20.3989539146423 secs

So even passing refs is about half the speed of direct param manipulation. And the copy is really expensive.

Update 2:

Updated above code to include a 4 arg substr() direct from params, which seems about 20% faster than an lhs 3 arg substr().

Comment on lhs substr(): refs vs. scalars Download Code

Replies are listed 'Best First'.
Re: lhs substr(): refs vs. scalars by BrowserUk (Patriarch) on Oct 08, 2005 at 17:06 UTC
You know you can use substr as an lvalue or with a fourth parameter to avoid duplicating your big scalars. If you need to use unpack to decode small chunks of the buffer, or pack to overwrite small chunks, use them in conjunction with substr to avoid copying: `my @decoded = unpack '...', substr $bigscalar, $offset, $size; substr $bigscalar, $offset, $size, pack '...', @newValues; # or substr( $bigscalar, $offset, $size ) = pack '...', @newValues;` [download] But be very sure that the size specified in substr, and the size of the result from pack match exactly, otherwise you will be expanding or shrinking your big scalar by the difference which will lead to nasty surprises. It may be better to use an intermediary variable here: `my $replacement = pack '...', @newValues; substr( $bigScalar, $offset, length $replacement ) = $replacement;` [download] It is also possible (from 5.8.5 onwards) to set up an array of lvalue references to chunks of your scalar and then manipulate the individual chunks through indirection: ## Create a scalar perl> $bigScalar = 'the quick brown fox jumps over the laxy dog';; ## Create an array of lvalue refs to the indivdual words using \substr +... perl> @lvrefs = map{ \substr $bigScalar, $_->[0], $_->[1] } [0,3], [4,5], [10,5], [16,3], [20,5], [26,4], [31,3], [35,4], [40,3] +;; ## Indirecting through the elements of the array gives you the words perl> print $$_ for @lvrefs;; the quick brown fox jumps over the laxy dog ## And assign through the elements allows you to replace them, in-plac +e, individually perl> ${ $lvrefs[7] } = 'lazy';; ## The ${ ... } is necessary. ## The typo is now corrected. perl> print $bigScalar;; the quick brown fox jumps over the lazy dog [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.	[reply] [d/l] [select]
Re^2: lhs substr(): refs vs. scalars by renodino (Curate) on Oct 08, 2005 at 18:24 UTC
Thanks to all. I've learned much today, and will be revising a lot of scripts to avoid the param copying. I don't know how I've been YAPH'ing for 8 years wo/ knowing this 8^/. It will certainly be interesting to see what sort of performance boosts I get. Is it safe to assume the ref'ing of substr() will survive into future versions ?	[reply]
Re^3: lhs substr(): refs vs. scalars by BrowserUk (Patriarch) on Oct 08, 2005 at 18:57 UTC
I'm the wrong person to ask, but I would assume so, as they (substr refs) have steadily been corrected and improved over the last few versions. They have been available since before my time (5.6.1), but through a bug in the implementation, there was originally only 1 lvalue ref available at a time for each given string in te program. This was fixed in 5.8.5. The most useful use of them is processing fixed length record files where you allocate the input buffer and create an array of lvalue refs to the fields. You can now read or sysread subsequent records directly into the buffer overlaying the previous record, and the fields array now refers to the fields of the new record. It saves re-divvying the buffer over and over for each record, which can save a good deal of memory (re)allocation when processing large files. Add a few seeks and you have an efficient and fairly cache freindly way of doing in-place editing on huge, fixed record length files. Not they're much in vogue these days, but they do have their uses :). I played with manipulating huge tiff images like this one (Warning!!! 11,477 x 7,965 x 24 image 204MB) directly on disk. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.	[reply]
Re^4: lhs substr(): refs vs. scalars by renodino (Curate) on Oct 08, 2005 at 19:41 UTC
Re^4: lhs substr(): refs vs. scalars by ysth (Canon) on Oct 09, 2005 at 03:45 UTC
Re^5: lhs substr(): refs vs. scalars by BrowserUk (Patriarch) on Oct 09, 2005 at 04:33 UTC
Some notes below your chosen depth have not been shown here
Re: lhs substr(): refs vs. scalars by Corion (Patriarch) on Oct 08, 2005 at 16:36 UTC
Perl is not really pass-by value - the stuff in @_ are (aliases to) the actual variables you pass in, not copies: `sub foo_ize { for my $val (@_) { $val =~ s!bar!foo!gi; }; }; my @arr = qw( baz bar baz BarBar ); foo_ize @arr; print join ",", @arr;` [download] The "pass by value" comes into effect once you employ the standard practice of copying your parameters: `sub foo_ize_val { my @args = @_; map { s!foo!bar!ig; $_ } @args; }; my @arr = qw( baz bar baz BarBar ); foo_ize @arr; print join ",", @arr; @arr = foo_ize @arr; print join ",", @arr;` [download]	[reply] [d/l] [select]
Re: lhs substr(): refs vs. scalars by pg (Canon) on Oct 08, 2005 at 16:45 UTC
Another way to prove that, what you see inside the sub through @_ is the same variables you see outside the sub: `use strict; use warnings; my $a; print \$a, "\n"; somesub($a); sub somesub { print \$_[0]; }` [download] This gives: `SCALAR(0x182419c) SCALAR(0x182419c)` [download]	[reply] [d/l] [select]


Pathologically Eclectic Rubbish Lister
	PerlMonks