http://qs321.pair.com?node_id=235029

BrowserUk has asked for the wisdom of the Perl Monks concerning the following question:

If I do

my $s = 'The quick brown fox jumps over the lazy dog'; print ${ \substr($s, $_, 1) } for 0 .. length($s)-1; # Output The quick brown fox jumps over the lazy dog

Which is what I expected but if I try

my @refs; push @r, \substr($s, $_, 1) for 0 .. length($s) -1; print $$_ for @refs; # Output ggggggggggggggggggggggggggggggggggggggggggg

and if I do

print @refs; # Output LVALUE(0x1bd5270) LVALUE(0x1bd5270) LVALUE(0x1bd5270) LVALUE(0x1bd5270 +) LVALUE(0x1bd5270) LVALUE(0x1bd5270) ...

Which I think indicates that there is only one LVALUE per string. I tried to read the source pp.c(pp_substr) to confirm this, but I'm not familiar enough with perlguts to interpret what I read.

Can anyone confirm my conclusion?


Examine what is said, not who speaks.

The 7th Rule of perl club is -- pearl clubs are easily damaged. Use a diamond club instead.

Replies are listed 'Best First'.
Re: LVALUE refs
by Elian (Parson) on Feb 13, 2003 at 21:39 UTC
    Not really one lvalue per string, but rather one lvalue for substr in general.

      Okay. Thanks Elian. I can see why I was fooled into thinking it was one per string. If I take an LVALUE ref to one string and save it. Then an LVALUE ref to another string and save it and then print the targets of both, they seem independant. Further investigation after your clarification shows that whilst I can still reference both independantly, only the latter remains a real LVALUE, the former having been converted to a standard scalar containing whatever the LVALUE was pointing at when the second LVALUE is taken.

      It's kind of a shame that it works that way, I had hoped to use an array of lvalue refs to acheive efficiencies in fixed record processing. Sort of emulating COBOL style storage overlays. I can see how it would mean major changes to allow this though.


      Examine what is said, not who speaks.

      The 7th Rule of perl club is -- pearl clubs are easily damaged. Use a diamond club instead.

        While this could be done, I think you'd find it not actually worth the effort, as it'll take more space and more time. The only real win would be in programmer efficiency.

        Given that programmer efficiency isn't at all a bad thing, you can duplicate substr's functionality--just write your own (presumably overriding the core substr) and have it return tied scalars that substitute back in the original. You could even enforce same-size replacement this way if you want.

        package Tie::Substr; use base 'Exporter'; @EXPORT = 'substr'; sub TIESCALAR { bless [ $_[1], 0 ], $_[0]; } sub FETCH { $_[0][1] = 1; @{ $_[0][0] } == 2 and return substr($_[0][0][0], $_[0][0][1]); @{ $_[0][0] } == 3 and return substr($_[0][0][0], $_[0][0][1], $_[0][0][2]); @{ $_[0][0] } == 4 and return substr($_[0][0][0], $_[0][0][1], $_[0][0][2], $_[0][0][3]); die; } sub STORE { $_[0][1] = 1; @{ $_[0][0] } == 2 and return substr($_[0][0][0], $_[0][0][1]) = $_[1]; @{ $_[0][0] } == 3 and return substr($_[0][0][0], $_[0][0][1], $_[0][0][2]) = $_[1]; eval 'substr($foo, 0, 0, 0) = ""'; die; } sub DESTROY { $_[0]->FETCH unless $_[0][1]; } sub substr : lvalue { tie my $foo, 'Tie::Substr', \@_; $foo }
        Doesn't always give an error when used in void context (fails substr.t tests 120 and 121) and doesn't report the correct line numbers, because I was too lazy to wrap everything in evals. This substr of course doesn't support $[, of course, causing substr.t tests 8..14 to fail. But you shouldn't set $[ anyway. And the thing uses a tied variable, so don't expect lightening speeds.

        Apart from those small and insignificant differences, it's compatible with normal substr, but with an lvalue per substr call :)

        For testing, I used perl 5.8.0 with its own substr.t.

        Juerd
        - http://juerd.nl/
        - spamcollector_perlmonks@juerd.nl (do not use).
        

Re: LVALUE refs
by John M. Dlugosz (Monsignor) on Feb 13, 2003 at 17:30 UTC
    Print the @refs mid-way through and verify that all the elements hold the last substr.

    To explore whether it's one lvalue per string it came from, or per substr token in the program, or whatever, try calling substr again after the loop on the same $s.

    Let us know.

    —John

Re: LVALUE refs
by diotalevi (Canon) on Feb 14, 2003 at 15:08 UTC

    BrowserUK independantly asked me about this issue and I ended up posting it as a bug to p5p. It's a bug because the behaviour differs between $r[0] = \substr ...; $r[1] = \substr ...; and $r[$_] = \substr ... for (0,1). It also looks like some hackers have picked it up and are applying a patch to correct the issue. Maybe that means it goes into the next version - maybe not. (current perl is 5.8.0)


    Seeking Green geeks in Minnesota

      Just for the record, a fix for this bug has been applied to the development sources as change #18705, so you can expect it to appear in perl-5.10.0 and future maintenance releases such as perl-5.8.1.

      The patch is here, so you could even apply it to your local copy of 5.8.0 if you're desperate for it (but do remember to add a reference in the local_patches in patchlevel.h if you do that).

      Hugo
Re: LVALUE refs
by xmath (Hermit) on Feb 16, 2003 at 14:09 UTC
    I haven't checked the sources, but empirical evidence made it clear to me there is only one LVALUE object per lexical occurrance of 'substr'.

    This means for example that while this works:

    $x = \substr("blah", 1, 2); $y = \substr("florp", 1, 3); print "$$x $$y\n";

    this will not:

    sub substrref { \substr($_[0], $_[1], $_[2]) } $x = substrref("blah", 1, 2); $y = substrref("florp", 1, 3); print "$$x $$y\n";

    The work-around is to create a new lexical occurrance each time, by using eval STRING:

    sub substrref { eval '\substr($_[0], $_[1], $_[2])' } $x = substrref("blah", 1, 2); $y = substrref("florp", 1, 3); print "$$x $$y\n";

    I hope this helps :-)

    (The obvious real solution is that substr() should check the refcount of the PVLV-object and create a fresh one if someone is still holding a reference to the previous one)

      I haven't checked the sources, but empirical evidence made it clear...
      This is one of the single most dangerous statements you can make. Perl is a programming language, not an archaeological dig or a physics experiment. Use the docs and, when they prove inadequate, read the source and check with p5p to see if what you found happens on purpose or accidentally.

      Assuming "found" behaviour will persist in future versions of perl is just asking for bizarre bugs and major breakage later on--if it's not documented to work a particular way, you shouldn't assume that it will. (And yes, I know perl's docs are insufficiently rigorous to really make this statement about any of its behaviours, but the point still holds--if you had to experiment to find behaviours, and can't find reference to them in the docs, you shouldn't count on the behaviours)

        I didn't mean to imply this was The Way, For Now And Ever.

        I simply observed perl's current behavior, and a work-around for this current behavior.

        (I don't think any docs talk about this issue)

      Thanks for reporting your findings, I too am a great believer in empirical evidence, even if only as a prelude to verification via other routes. Deriving this understanding from the Perl sources would require considerably more familiarity than I currently have, or am likely to expend the time to aquire. Whilst I love Perl as a language, I absolutely hate having to delve into the source, it's like reading a completely different language rather than C.

      <personal grouse>Effective as it is, why oh why isn't it at least indented properly. In this day an age when I routinely use a CLI window that has 160 characters of width and often push this to 200 when the need arises, why does everything have to be squashed up into the left-hand 50 chars? </personal grouse>

      I'm not sure that I'll be making any use of the information though. Once you see what is involved in an LVALUE

      perl> use Devel::Peek perl> $r = \substr('the quick brown fox', 10, 5) perl> print $r LVALUE(0x1bd2a70) perl> print $$r brown perl> Dump($r) SV = RV(0x1bd1cc0) at 0x1bd29f8 REFCNT = 1 FLAGS = (ROK) RV = 0x1bd2a70 SV = PVLV(0x1bc3388) at 0x1bd2a70 REFCNT = 1 FLAGS = (PADMY,GMG,SMG,pPOK) IV = 0 NV = 0 PV = 0x1bc37a8 "brown"\0 CUR = 5 LEN = 6 MAGIC = 0x1bc3338 MG_VIRTUAL = &PL_vtbl_substr MG_TYPE = 'x' TYPE = x TARGOFF = 10 TARGLEN = 5 TARG = 0x1bdf074 SV = PV(0x1bcd490) at 0x1bdf074 REFCNT = 1 FLAGS = (PADBUSY,POK,pPOK) PV = 0x1bc19f0 "the quick brown fox"\0 CUR = 19 LEN = 20

      you realise that LVALUE refs are far from the 'cheap' utility that they appeared (to me at least) to be. That combined with their undocumented nature probably outweights any advantages that they might have provided.

        ehm, Dump() just shows it all very verbosely.. but it IS a cheap utility. Cheaper than any alternative, at least.

        All you're seeing is that $x is a reference (RV 0x1bd1cc0) to the lvalue object (PVLV 0x1bc3388) which points to the string you've taken a substring of (PV 0x1bcd490), and the lvalue object has one piece of magic to point to the functions implementing extraction and replacement of substrings. That's obviously about as simple as it can get.

        Just for fun, dump an array containing the relevant info: Dump [\'the quick brown fox', 10, 5];