http://qs321.pair.com?node_id=688335


in reply to Ways to delete start of string

After reading the other remarks here and (wildly) guessing that your string invocation and copying may look not look realistic for tasks usually done with character substitutions, I tried to make the test more illustrative - (re-ordered your code and added some meaningful names ;-)

The output is subsequently generated for different string sizes, from 2x10^1 to 2x10^4 bytes of length:

use strict; # Purpose: In each benchmark invocatio +n, have one use warnings; # (constant) string copied to another- + which is then # modified (shortened by the first cha +racter) and for my $n (1..4) { # touched again (length determined and + compared) use Benchmark qw(cmpthese); my $org_str = '|0' x 10**$n; # generate the string in local scope my $mod_str = $org_str; # do some allocation on the other stri +ng's PV print "string length: " . length($org_str) . "\n"; cmpthese( -3, { regexsubst => sub { # copy and modify ($mod_str = $org_str) =~ s/.//; die unless length($mod_str)+1 == length($org_str +) }, substr_rhs => sub { # there's no point full string copy, simply c +opy what's needed $mod_str = substr($org_str, 1); die unless length($mod_str)+1 == length($org_str +) }, substr_lhs => sub { # copy and modify substr($mod_str = $org_str, 0, 1) = ''; die unless length($mod_str)+1 == length($org_str +) }, reversestr => sub { # reverse, copy, modify, reverse chop($mod_str = reverse($org_str)); $mod_str = reverse $mod_str; die unless length($mod_str)+1 == length($org_str +) } } ); print '- ' x 30, "\n" }

On my machine (5.10), the right-side substr() wins almost always (if the string in question is not longer than some KB), as the reverse-chop-reverse looses. The regex-subst performance approaches the substr speed asymptotically as the string gets longer - but seems to be slower as the left-side-substr() on the shortest string tested.

The funny part is: the left-side substr() will beat the right side substr() if the string exceeds some (larger) size. I wouldn't have thought of this one!

Results:

string length: 20 Rate reversestr substr_lhs regexsubst substr_rhs reversestr 1120993/s -- -12% -13% -48% substr_lhs 1273691/s 14% -- -1% -41% regexsubst 1289977/s 15% 1% -- -40% substr_rhs 2144546/s 91% 68% 66% -- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - string length: 200 Rate reversestr regexsubst substr_lhs substr_rhs reversestr 741492/s -- -37% -39% -61% regexsubst 1172942/s 58% -- -3% -39% substr_lhs 1211322/s 63% 3% -- -37% substr_rhs 1915850/s 158% 63% 58% -- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - string length: 2000 Rate reversestr substr_lhs regexsubst substr_rhs reversestr 145793/s -- -82% -82% -84% substr_lhs 795724/s 446% -- -4% -11% regexsubst 828896/s 469% 4% -- -7% substr_rhs 894365/s 513% 12% 8% -- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - string length: 20000 Rate reversestr substr_rhs regexsubst substr_lhs reversestr 16427/s -- -86% -92% -92% substr_rhs 119927/s 630% -- -39% -41% regexsubst 197394/s 1102% 65% -- -3% substr_lhs 202514/s 1133% 69% 3% -- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Update:

After studying Moritz' Code (there is one - but hidden behind readmore-tags ;-), the results are compatible (he tested long strings too), but Moritz wasn't curious about it - so it might come as expected.

Apparently, there seems to be no 'flip-flop', as hsmyers suggested - the left-side substr() simply takes over on longer strings (above several KB).

my € 0.02

mwa

Replies are listed 'Best First'.
Re^2: Ways to delete start of string
by ysth (Canon) on May 25, 2008 at 06:13 UTC
    Both the 4-arg substr (I find left-side a confusing term) and s/// avoid copying, so should be more or less constant. They do this by just adjusting the beginning pointer into the string buffer and noting the offset used in the slot usually used for integer value. This is called the OOK hack (OOK being the flag set to indicate the integer slot is storing an offset).
      This is called the OOK hack ...

      Thanks for hinting on this, after your explanation I can see the pattern now.

      I find left-side a confusing term

      I was under the impression that 'left hand side term' vs. 'right hand side ...' would be a terminus technicus here (?)

      Thanks & Regards

      mwa

        lvalue and rvalue would be less confusing to me.
Re^2: Ways to delete start of string
by hsmyers (Canon) on May 24, 2008 at 23:45 UTC
    I'm just looking at the results; and there is a flip-flop from first invocation to second to third. Or am I blind?

    --hsm

    "Never try to teach a pig to sing...it wastes your time and it annoys the pig."
      and there is a flip-flop from first invocation to second to third.

      Maybe that's a semantical thing missed by me. Flip-flop implies, imho, two changes of positions - so I thought you spoke about these "two changes" - my mistake.

      Regards

      mwa