Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

lvalue substring oddities

by ysth (Canon)
on Nov 12, 2003 at 01:42 UTC ( [id://306449]=perlmeditation: print w/replies, xml ) Need Help??

I'd like to gather some input on how substr should act in some corner cases. I'll start with the most prominent:
$ perl -wl $x = "abc"; print "4-arg: ", substr($x="abc", 1, 1, ""); print "3-arg: ", substr($x="abc", 1, 1) = ""; __END__ 4-arg: b 3-arg: c
Often people expect these to work alike. Even when they realize that the 4-arg version (as documented) returns the replaced portion of the string while the 3-arg version returns part of the new string, some discard the evidence of their eyes and think that the 3-arg version returns the string being assigned. (Past discussions of this: 108532, 158756, 191334, 298173; also see perl bugs #16834, #24069, and #24346.)

What's really happening is fairly simple to describe: the special magic value returned by substr is used twice, once by the assignment to replace 1 character at position 1, and once by print to get the 1 character now at position 1.

A patch to perl is being considered to adjust this behaviour to be more DWIMy. It updates the length stored in the magic value when ever it is set, so substr(...)="foo" has the appearance of returning the value assigned.

What do you all think? Should this be changed in 5.8.3? In 5.10.0? Never? In one of the threads above, leaving it unchanged is passionately defended by one monk, on the grounds of backward compatibility and perl6 being the proper place to break this kind of thing. On the other hand, the current behavoiur generates a lot of questions. On the third hand, the questions mostly take the form of expecting substr(a,b,c)="" to return the same as substr(a,b,c,""), which isn't really possible. On the fourth hand, this still works with the proposed change :)

Update: I think my description of the change is inadequate. Basically, lvalue substr returns a magic SV that acts as if was tied with this access:

sub FETCH { my $self=shift; substr($self->{string}, $self->{length}, $ +self->{offset}) } sub STORE { my ($self, $new) = @_; substr($self->{string}, $self->{len +gth}, $self->{offset}, $new) }
(but tie magic isn't actually used). The change is equivalent to adding a $self->{length} = length($new) at the end of STORE.

Replies are listed 'Best First'.
Re: lvalue substring oddities
by Zaxo (Archbishop) on Nov 12, 2003 at 04:22 UTC

    I've become persuaded that the current behavior should be kept. It is deducible from operator precedence and, once known, is not the cause of any great confusion.

    What will be the effect of the proposed 'fix' on constructions like this?

    { my $foo = 'ABC-2N322-850103'; sub prefix () :lvalue { substr $foo, 0, index( $foo, '-') } }
    Will the compiler keep the old value from the index? Does the associativity of assignment change? How is prefix() to know whether it is in lvalue position in a call? Any such change needs to try these things.

    Update: Actually, the code above has a surprise in it, anyway. The result of the index call is kept instead of evaluated a second time:

    { my $foo = 'ABC-2N322-850103'; sub prefix () :lvalue { substr $foo, 0, index( $foo, '-') } sub foo () { $foo } } print foo, $/; my $bar = prefix = 'DEFG'; print $bar, $/ print foo, $/; __END__ output: ABC-2N322-850103 DEF DEFG-2N322-850103

    After Compline,
    Zaxo

      I've become persuaded that the current behavior should be kept. It is deducible from operator precedence and, once known, is not the cause of any great confusion.
      Somewhat agree. The existing behaviour is also IMO easier to document clearly than with the proposed change.

      But to throw fuel on the fire:

      $ perl -wl $x = "abc"; for $z (substr($x,1,1)) { print ":$z:"; $z = "zz"; print ":$z:" } __END__ :b: :z:
      If you alias something to the substr or pass it as a parameter, you can get some pretty odd results currently. OTOH, this is not a lot different from this case:
      "abc"=~/.(.)/; for $z ($1) { print ":$z:"; "zz" =~ /.(.)/; print ":$z:" }
      To reply to your sub :lvalue case, the index shouldn't be kept from one call to prefix() to the next with either the existing or proposed behaviour. However there is a bug (#24200) with substr used in an :lvalue sub. That bug should be fixed whatever else happens. (The bug is that rvalue calls to the sub will not work correctly after the first lvalue call.)

      Aside from that, your code is a perfect example of how things can go wrong the way things are. Just saying (prefix = "AB") = "ABC" gets rid of the first '-'; with the proposed change it would leave $foo back in its original state.

      Update to respond to Zaxo's update; the "surprise" is exactly the subject of the thread. When you say $foo = substr() = $bar substr returns a magic value that is first stored and then fetched. The "problem" is that both use exactly the same length, even if the store expanded or contracted the string. substr() being replaced by an :lvalue sub is exactly the same case...the sub is called only once but the magic value returned is acted on twice, both times with the originally determined length.

Re: lvalue substring oddities
by BrowserUk (Patriarch) on Nov 12, 2003 at 09:11 UTC

    Personally, I consider the current behaviour as broken as I would consider print sqrt( -10 ); printing 3.16227766016838. Except that I can conceive of the situation whereby the latter result might actually be useful. I could see someone relying upon that broken behaviour and skipping the step of detecting negative values and absing them, if that was the requirement of their application. They shouldn't, but I could see it happening.

    However, the current resultant of substr as an lvalue, whilst predictable, is so twisted, and has so many edge cases, I find it inconceivable that anyone has actually found a reasonable use for the current behaviour, much less constructed an application that relies upon it.

    I have personally encountered the situation (twice at least) where having the value assigned, returned would have slightly simplified my code.

    As such, I would consider this change fixing a broken behaviour, rather than modifying an existing, useful one, and if it served to highlight broken code, or reliance upon undocumented, broken behaviour in existing applications, then so be it. Simplistically, if adding 1 + 1 returned 3 and someone coded their app to rely upon that, we wouldn't think twice about not maintaining backwards compatibility.

    To my way of thinking, this fits in the same category.


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
    Hooray!
    Wanted!

      However, the current resultant of substr as an lvalue, whilst predictable, is so twisted, and has so many edge cases, I find it inconceivable that anyone has actually found a reasonable use for the current behaviour, much less constructed an application that relies upon it.

      Considering that 4-arg substr is a relative new invention (5.004? 5.005?), there must have been a lot of code that uses the lvalue-ness of substr. substr has never surprised me, but then, I've never used the 3-arg form in both rvalue and lvalue context at the same time, nor have I ever used 4-arg substr as an rvalue.

      I'm not really convinced this issue is worth breaking backwards compatibility for. No doubt there is code right now that depends on this behaviour - and if the current behaviour confuses you, don't use it. Write it in two lines.

      Abigail

        Sorry, but my contention is that there almost certainly isn't code right now that depends upon this behaviour.

        The current behaviour is so inherently non-useful, almost unpredictable, that I think it impossible that anyone has found a use for it. Leaving it as it is, just means perpetuating a non-useful, unusable behaviour where a useful behaviour could be provided.

        I can't believe that you, of all people, are suggesting that the rest of the world eshew a possible, advanced behaviour just because it is not one that you personally have ever thought to try, or because it is easily achieved by the combination of two simpler ones?

        Of course this is only my opinion and that carries exactly as much weight as anyone cares to give it, but the logic of:

        Don't make something not useful into something useful because it might break something that is non-useful, but pre-existing.

        I find faintly ludicrous.


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "Think for yourself!" - Abigail
        Hooray!
        Wanted!

Re: lvalue substring oddities
by pg (Canon) on Nov 12, 2003 at 03:53 UTC

    I think it should be fixed. This is a twist of human mind.

    Backward compatibility is not really a problem. Without fixing it, this kind of broken interpretation will go die sooner or later. Instead of letting it die, why not just fix it and revive it. I don't expect many people or many application actually use this feature, the compatibility problem is minimum.

    On the other hand, this kind of unnatural stuff really hurts application's maintainability, and programmer should be responsible for assessing it before using it.

Re: lvalue substring oddities
by shotgunefx (Parson) on Nov 12, 2003 at 08:38 UTC
    Being one of the confused people you cite, I'll chip in my two cents. I think it should be left the way it is for the sake of backward compatibility. I don't think in my opinion it bites users that often, and to me it doesn't make sense to change it's behaviour at this late version. I think it should be left alone until Perl 6.


    -Lee

    "To be civilized is to deny one's nature."

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://306449]
Approved by mpeppler
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (4)
As of 2024-03-29 10:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found