http://qs321.pair.com?node_id=191334

BrowserUk has asked for the wisdom of the Perl Monks concerning the following question:

I came across what, according to my reading of the docs, is an anomoly in the way substr works.

Is this a bug in substr, or my reading of the docs?

0.02 0.02 >$,='|' |0.02|0.02|>$s = qq(the quick brown fox jumps over the lazy dog) # Used this way, the return is the 4 bytes replaced. |0.02|0.02|>print substr($s,0,4,''),$s the |quick brown fox jumps over the lazy dog |0.02|0.02|>print $s quick brown fox jumps over the lazy dog # But used this way, it is the first 4 bytes *after* the replacement o +ccurs? |0.02|0.02|>print substr($s,0,4)='',$s k br|k brown fox jumps over the lazy dog |0.02|0.02|>print $s k brown fox jumps over the lazy dog

What's this about a "crooked mitre"? I'm good at woodwork!

Replies are listed 'Best First'.
(tye)Re: [substr] anomaly or mine?
by tye (Sage) on Aug 19, 2002 at 23:50 UTC

    The first is a special behavior of 4-argument substr which is rather handy. The second is just the way assignments work (what value they themselves return). Just like:

    my $x= "foo"; print $x= "bar";
    prints "bar".

    Update: Actually, I expected substr(...)= expr to return expr. Rereading, I see that it isn't. I'd call that a bug.

            - tye (but my friends call me "Tye")

      Using that logic, then, you would expect:

      print substr($s,0,4)='',$s;

      ..to print:

      |k brown fox jumps over the lazy dog

      ..as it would return what it was just set to.

      perl -pe '"I lo*`+$^X$\"$]!$/"=~m%(.*)%s;$_=$1;y^`+*^e v^#$&V"+@( NO CARRIER'

        Thankyou.


        What's this about a "crooked mitre"? I'm good at woodwork!

      If it's a bug, will it be picked up automatically as its been mentioned here or should I report it? If so, where?

      Or will you or one of the backroom guys here?


      What's this about a "crooked mitre"? I'm good at woodwork!
      Re-re-reading it, you'd expect substr(...)=expr to return what the substr was set to. In this case, it returns the value of the substr after the operation was complete. Your 4-argument, err, argument does seem to be the most persuasive.

      Bug? Feature? Undefined behavior? I think a case could be made for the validity of either mode of operation.
      I agree with your update.
Re: [substr] anomaly or mine?
by sauoq (Abbot) on Aug 19, 2002 at 23:53 UTC
    The behaviour makes sense to me but I don't think the docs are as clear as they might be. They do say:
    An alternative to using substr() as an lvalue is to specify the replacement string as the 4th argument. This allows you to replace parts of the EXPR and return what was there before in one operation, just as you can with splice().

    The key phrase in the above being "one operation." When used as an lvalue it does it in two operations, the assignment and then the substr and you get the result of the substr().

    Update: I was becoming more and more convinced that my guess was the correct one. Then I discovered this:

    $ perl -le '$,="|";$s="foobar"; print substr($s,-2,2)="z",substr($s,-2 +,2)' z|bz

    given that the substring is different in the two cases, it isn't just doing the substr() after the replacement. I'd say that is a bug and I think I have to agree with tye. It seems that the "right" behavior would be to return the string assigned as in any assignment.

    I wonder how many obfus changing that will ruin...

    -sauoq
    "My two cents aren't worth a dime.";
    
Re: [substr] anomaly or mine?
by PodMaster (Abbot) on Aug 20, 2002 at 01:25 UTC
    #!/usr/bin/perl -w use strict; $\="\n"; ## first, to BrowserUk's misunderstanding, an explanation my $s = 'ABCD'; print substr($s,0,2)=''; # EXPECT 'CD', what's left print $s; # CD $s = 'ABCD'; print substr($s,0,2,''); # EXPECT 'AB', print $s; # CD # perldoc -f substr # An alternative to using substr() as an lvalue is to specify the # replacement string as the 4th argument. This allows you to # replace parts of the EXPR and return what was there before in # one operation, just as you can with splice(). my @B = 1..4; print splice(@B,0,2,()); # expect 12 print @B; # expect 34 # perldoc -f splice # Removes the elements designated by OFFSET and LENGTH from an # array, and replaces them with the elements of LIST, if any. In # list context, returns the elements removed from the array. # now to tye's argument, saying that substr($s,0,2)= EXPR # doesn't return EXPR and is therefore a bug # perldoc -f substr # You can use the substr() function as an lvalue, in which case # EXPR must itself be an lvalue. $s = '1234'; @B=(); $B[0] = substr($s,0,2) = 'ab'; print $s; # EXPECT ab34 print "@B"; # EXPECT ab print "ASSIGN '' "; $s = '1234'; @B=(); $B[0] = substr($s,0,2) = ''; print $s; # expect 34 print "@B"; # EXPECT 34, cause '' wouldn't be an lvalue print "ASSIGN undef "; $s = '1234'; @B=(); $B[0] = substr($s,0,2) = undef; print $s; # expect 34 print "@B"; # EXPECT 34, cause undef wouldn't be an lvalue ## CONCLUSION ## if '' and undef are lvalues, then this is a feature, and not a bug __END__ CD CD AB CD 12 34 ab34 ab ASSIGN '' 34 34 ASSIGN undef Use of uninitialized value in scalar assignment at BrowserUk.substr.pl + line 51. 34 34

    ____________________________________________________
    ** The Third rule of perl club is a statement of fact: pod is sexy.

      In the perlfunc:substr docs, I can find only two references to what it returns:

      • Extracts a substring out of EXPR and returns it. (2 and 3 arg version)

      and

      • This allows you to replace parts of the EXPR and return what was there before. (4-arg version)

      The latter isn't really relevant as I'm using the 3-arg version, but it does describe one of the two behaviours I would have expected.

      If the 3-arg version as an lvalue were consistant with the 4-arg version, I would expect the replaced part, ('quick ' in my short demo program below) to be returned.

      If it was consistant with assignment, then I would expect either the thing being assigned (''), or the result of the assignment ('the brown fox');

      What is actually being returned is 'brown '

      What is happening is that the substitution (deletion) is being performed using the offset and length, AND THEN the same values are being re-used to extract a return value FROM THE RESULTANT of the assignment.

      Whilst I can explain what is happening, using the offset and length *twice* is at least strange. Doing so in order to provide a return value that bears no relationship to either the sub-string specified by the programmer, nor the resultant of the assignment simply make no sense at all.... leastwise, not to me.

      #! perl -w my $s = qq(the quick brown fox); print substr($s, 4, 6) = '', $/; __END__ #output C:\test>191356 brown C:\test>

      With respect to if '' and undef are lvalues, then this is a feature, and not a bug

      You rightly point out that the docs say

      You can use the substr() function as an lvalue, in which case EXPR must itself be an lvalue.

      However,

      substr EXPR,OFFSET,LENGTH,REPLACEMENT

      substr EXPR,OFFSET,LENGTH

      substr EXPR,OFFSET

      The EXPR that must be a LVALUE is the first parameter to substr, not the value being assigned to it.

      I can explain what is happening. I still contend that it is neither documented that way nor intuatively expected. Nor, in my opinion, is it sensible.

      As it is, I am sufficiently convinced that this is a bug that I will send a perlbug report in along with the bulk of this post minus references about our debate. A link to this entire thread, and allow them to decide whether this is a documentation change, and code fix or simple a vagary of "BrowserUk's missunderstanding".


      What's this about a "crooked mitre"? I'm good at woodwork!
      Your first EXPECT does not fully distinguish the actual behaviour that feels anomalous:
      my $s = 'ABCDE'; print substr($s,0,2)=''; # EXPECT 'CD' or 'CDE'?
      The actual is 'CD'; intuition would expect 'CDE'

      perhaps a better test case is

      my $s = 'ABCDE'; print substr($s,0,2)='1'; # EXPECT '1C' or '1CDE'?
      again, actual is "1C". --Dave

        My intuition would expect:

        my $s = 'ABCDE'; print substr($s,0,2)='',$/; # EXPECT '', GET 'cd'? print substr($s,0,2)='foo',$/; # EXPECT 'foo', GET 'fo'?

        ..in parallel with all other assignments:

        my $s = 'ABCDE'; print $s='',$/; # EXPECT '' print $s='foo',$/; # EXPECT 'foo'

        perl -pe '"I lo*`+$^X$\"$]!$/"=~m%(.*)%s;$_=$1;y^`+*^e v^#$&V"+@( NO CARRIER'

Re: [substr] anomaly or mine?
by tadman (Prior) on Aug 19, 2002 at 23:46 UTC
    I think that's what they would term an undefined behaviour of the function. In other words, you should use it as an lvalue, or as an rvalue, but not as both at the same time.

      Um...If I strip away the coments and the output you'll see that I am only using it as an lvalue or an rvalue I think?

      print substr($s,0,4,''),$s # rvalue?

      print substr($s,0,4)='',$s # lvalue?


      What's this about a "crooked mitre"? I'm good at woodwork!
        In the first case, this is a pure rvalue usage. In the second case, due to the assignment, you're using it as an lvalue, but then, due to the print, turing the resulting expression back into an rvalue. This double conversion seems to be the root of your problems.

        I was expecting it to return '', since that's what the assignment operator usually does. As soon as you introduce the lvalue-function angle, I think you're playing with fire.
Re: [substr] anomaly or mine?
by Aristotle (Chancellor) on Aug 19, 2002 at 23:52 UTC
    An alternative to using substr() as an lvalue is to specify the replacement string as the 4th argument. This allows you to replace parts of the EXPR and return what was there before in one operation, just as you can with splice().
    That could be read as though without using the fourth parameter, you cannot get what was there before - but I get the feeling that that's twiddling words too much and it really is not expected behaviour. (I'm a lot of help, am I?)

    Makeshifts last the longest.

      Using the 4 parm form is working as expected! It's the other version that is giving me a headache!


      What's this about a "crooked mitre"? I'm good at woodwork!
A Bug in the Documentation or in Perl?
by tlhf (Scribe) on Aug 20, 2002 at 10:17 UTC
    I've had code which uses substr($foo, n, n) = ''; as an rvalue. And I'm sure others have had code which used this too. If this usage is considered a bug and Perl was resultingly changed, then this code will be broken. And this code wasn't written on a bug - it was written on undefined behaviour. As such, then any change to the perl implementation should be considered a core language change.

    With Perl 5 coming towards the latter half it's usage, is it sensible to be making core language changes? Surly it would just make more sense to define this behaviour - and change the documentation to reflect this difference.

    tlhf
    Just wanting to be able to upgrade Perl5 without fear of code breakage.

      See, undefined behaviour is just that.. if you rely on it, you are submitting to surprises. Of course, your stance is a legit one, and I do think that cases like yours have to be considered; old code should not be broken lightheartedly, even if it was written with dirty practices. But I do not believe that this argument can outweigh significant reasons that favour a change.

      I'm not saying that substr should be changed, rather than the current behaviour just becoming defined and documented. Whether there are any strong arguments in favour of a change in this specific instance is yet to be seen.

      I do however feel that reliance on poorly documented or defined behaviour or even on outright bugs is too common a bad habit and anything that will teach people to refrain from it can only be a good thing in the long term.

      After all, noone is forcing you to upgrade - Perl4, 5.0005_003 etc all still run fine. Old codebase does not have to be fixed right this instant.

      I realize this is somewhat of an ivory tower opinion and that things in the realworld don't quite always work the way I would like them to. However, unless you begin somewhere and sometime, that's never going to change. And in Germany they say (roughly), a terrifying end is better than neverending terror.

      In the end, I guess, it comes down to trying to find the best compromise. Legacy code should not hinder progress - but of course progress at all cost is no better either. Anyway, I think I'll quit this longwinded, meandering tangential post here before it goes much further.

      ____________
      Makeshifts last the longest.
        In a quick response, and some clarification:

        Yes, I agree that using undefined behaviour is bad style. That's an extremely valid point. I try at all costs to avoid such practices nowadays. But on the flipside, in Perl many things are undefined. Abigail-II has pointed out in the past that even $i = $i++; has undefined outcome, although all would agree that $i should not increment. The full debate started innocently here shows that there are many undefined parts to the very simplist of code. Yet most wouldn't complain such code's use.

        And I definitely agree that this should be cleared up. The point I was (poorly) trying to get at that this should not be fixed now. We're going to break pretty much the entire of Perl5 when Perl6 is released. We could sort out substr's odd behaviour then, and no harm will come of it.

        If this glitch was brought to attention a few years ago then I would have fully advocated a bug fix. But with it so late in the day now for Perl5, I geniunely believe the advantages of fixing this problem will be greatly outweighed by the problems it will cause. Perl5 is considered warty - and Perl6 is a fixup of those warts (with a few new groovy features added, naturally). I think it would make much more sense to just define this as it for Perl5 is in perldoc -f substr, and clean it in Perl6.

        tlhf
        Update: Slight typo; final $i was originally $i++;

      I've had code which uses substr($foo, n, n) = ''; as an rvalue. And I'm sure others have had code which used this too. If this usage is considered a bug and Perl was resultingly changed, then this code will be broken.

      Eh? Are you serious? Then, please tell, what is this snippet supposed to do? Shouldn't you have used the 4-argument substr() instead?

      IMO, the value of an assignment is, commonly, what is assigned. This behaviour deviates from that generic rule in a very bizarre way: it does not return the new value, nor the old one. It just looks totally unreliable to me.

      And this code wasn't written on a bug - it was written on undefined behaviour. As such, then any change to the perl implementation should be considered a core language change.

      No way José. A change in previously undefined behaviour is called an improvement on the specification. You should never ever depend on undefined — or unspecified, behaviour. It is the prerogative of the Perl Porters to change that behaviour without any good reason.

      And changing the behaviour so that in some way it would start to make sense, either in line with the rest in the language and returning the newly assigned value, or returning the replaced value, which would be inconsistent with the rest but which might be very handy; well: these make sense. To me, anyway.

      I don't think your idiom is used that much.