pudge has asked for the wisdom of the Perl Monks concerning the following question:

Long story short: we want to mark strings so that later we can do something with them, even if they get embedded in other strings. So we figured, hey, let's try overloading. It is pretty neat. I can do something like:
my $str = str::new('<encode this later>'); my $html = "<html>$str</html>"; print $html; # <html><encode this later></html> print $html->encode; # <html>&lt;encode this later&gt;</html>
It does this by overloading the concatenation operator to make a new object array with the plain string "<html>", the object wrapping "<encode this later>", and the plain string "</html>". It can nest these arbitrarily. On encode, it will leave the plain strings, but encode the object strings. But if you stringify the object, it just spits it all out as plain strings. This works well, except that in some cases, it stringifies for no apparent reason. The script below shows the behavior, which I've duplicated in 5.10 through 5.22.
#!/usr/bin/perl use strict; use warnings; use 5.010; use Data::Dumper; $Data::Dumper::Sortkeys=1; my $str1 = str::new('foo'); my $str2 = str::new('bar'); my $good1 = "$str1 $str2"; my $good2; $good2 = $good1; my($good3, $good4); $good3 = "$str1 a"; $good4 = "a $str1"; my($bad1, $bad2, $bad3); $bad1 = "a $str1 a"; $bad2 = "$str1 $str2"; $bad3 = "a $str1 a $str2 a"; say Dumper { GOOD => [$good1, $good2, $good3], BAD => [$bad1, $bad2, $ +bad3] }; $bad1 = ''."a $str1 a"; $bad2 = ''."$str1 $str2"; $bad3 = ''."a $str1 a $str2 a"; say Dumper { BAD_GOOD => [$bad1, $bad2, $bad3] }; package str; use Data::Dumper; $Data::Dumper::Sortkeys=1; use strict; use warnings; use 5.010; use Scalar::Util 'reftype'; use overload ( '""' => \&stringify, '.' => \&concat, ); sub new { my($value) = @_; bless((ref $value ? $value : \$value), __PACKAGE__); } sub stringify { my($str) = @_; #say Dumper { stringify => \@_ }; if (reftype($str) eq 'ARRAY') { return join '', @$str; } else { $$str; } } sub concat { my($s1, $s2, $inverted) = @_; #say Dumper { concat => \@_ }; return new( $inverted ? [$s2, $s1] : [$s1, $s2] ); } 1;
I want all of these to be dumped as objects, not strings. But the "BAD" examples are all stringified. All of the "BAD" examples are when I'm assigning a string object I am concatenating at the moment to a variable previously declared. If I declare at the same time, or concatenate the strings previously, or add in an extra concatenation (beyond the interpolated string concat), then it works fine. This is nuts. The result of the script:
$VAR1 = { 'BAD' => [ 'a foo a', 'foo bar', 'a foo a bar a' ], 'GOOD' => [ bless( [ bless( [ bless( do{\(my $o = 'foo')}, ' +str' ), ' ' ], 'str' ), bless( do{\(my $o = 'bar')}, 'str' ) ], 'str' ), $VAR1->{'GOOD'}[0], bless( [ $VAR1->{'GOOD'}[0][0][0], ' a' ], 'str' ) ] }; $VAR1 = { 'BAD_GOOD' => [ bless( [ '', bless( [ bless( [ 'a ', bless( do{\(my $o + = 'foo')}, 'str' ) ], 'str' ), ' a' ], 'str' ) ], 'str' ), bless( [ '', bless( [ bless( [ $VAR1->{'BAD_GOOD +'}[0][1][0][1], ' ' ], 'str' ), bless( do{\(my $o = 'bar') +}, 'str' ) ], 'str' ) ], 'str' ), bless( [ '', bless( [ bless( [ bless( [ bless( [ + 'a ', + $VAR1->{'BAD_GOOD'}[0][1][0][1] ] +, 'str' ), ' a ' ], 'str' ) +, $VAR1->{'BAD_GOOD +'}[1][1][1] ], 'str' ), ' a' ], 'str' ) ], 'str' ) ] };
The behavior makes no sense to me. I'd like to understand why it works this way, and I'd like to find a workaround.

Replies are listed 'Best First'.
Re: Overloading Weirdness
by haukex (Bishop) on Jun 23, 2018 at 08:31 UTC

    I boiled your examples down to the case of "$str1 $str2", and I can confirm the behavior you observed on Perl 5.26 as well. The answer seems to be that Perl optimizes away the extra stringification step in certain cases - note how in the below, stringify becomes ex-stringify:

    I'm not yet sure what you could do about this, or even if you can do something about it, because apparently the final stringification is always part of Perl's intended operations, but it's just optimized away in some cases. Plus, at the moment I can't remember and can't find in the docs whether Perl even makes any guarantees as to what order the concatenations and stringifications are executed in a case like "$str1 $str2", which one might think is equivalent to $str1." ".$str2, but apparently is not, because the former sometimes involves an extra stringification. Normally the order doesn't really make a difference, except unfortunately when you rely on overloading. One might even argue that the aforementioned inconsistency could be seen as bug in Perl.

    Of course the other possible argument is that in this case, relying on overloading is the problem. If I take a step back and look at this from an "XY Problem" angle: You've got objects that you want to behave as normal strings, except that they (loosely speaking) have an extra layer of encoding wrapped around them. You're generating HTML, so that encoding is for example "<>" to "&lt;&gt;". You're worried because you see this encoding happening too early. But my question here is: you've got some strings that should be encoded (your str objects), and others that shouldn't (plain strings). But in the end, everything is going to become a plain string anyway, so why do you care when the encoding step happens? (Maybe I'm missing something.)

    I completely understand that this kind of an implementation is "neat" :-) But why not use the existing solutions for HTML generation? For example, Template::Toolkit, Mojo::Template, ...

    By the way: Crossposted to StackOverflow. Crossposting is ok, but it is considered polite to inform about it so that efforts are not duplicated.

      Thanks. I agree that there is no guaranteed behavior, and I think it is probably right to consider it a bug. I am not sure if you're right that it should be stringifying, but maybe.

      I could get into the Why, and why your proposals don't work for us, but it's a long and boring story. The bottom line is that it is an old legacy app and we need the delayed decision.

      Thanks again!

        The bottom line is that it is an old legacy app and we need the delayed decision.

        Ok, well the best suggestion I can come up with at the moment is to not overload stringification and make that an explicit method call. That would allow you to weed out those cases where the objects are stringified when you don't want them to be.

Re: Overloading Weirdness
by Athanasius (Archbishop) on Jun 23, 2018 at 07:12 UTC

    Hello pudge,

    I tried separating your str package into its own module/file, and then useing it in the main script:

    use strict; use warnings; use 5.010; use Data::Dumper; $Data::Dumper::Sortkeys = 1; use lib qw(.); use str; ...

    The output may be closer to what you were wanting (if I’ve understood correctly):

    I don’t pretend to know exactly what’s going on (I have the ’flu, my head is more than usually woolly), but perhaps this will help you towards a solution.


    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

      "." means the current work directory, not the script's directory.

      use lib qw(.);
      should be
      use FindBin qw( $RealBin ); use lib $RealBin;

        Hello ikegami,

        Thanks for that! I had a hard time trying to think of a situation in which the current working directory would be different to the script’s directory (in the absence of explicit calls to chdir, of course). Finally, the penny dropped: if I invoke a script foo/bar/ from the foo directory:

        perl bar/

        the script’s directory is foo/bar, but the current working directory is still foo!

        Thanks again,

        Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

        Regardless, if the difference is that the class is in a separate file, that would be super weird.
      What version of perl is that? I saw no difference whether it was in a separate file. Thanks!

        I was using Strawberry Perl 5.26.0 (64-bit), but I get the same result on my various Strawberry Perls going back to 5.10.1 (32-bit).

        It occurred to me to try keeping the str package in the same file as the main code, but moving it to the top of the file:

        package str; ... sub concat { ... } package main; use strict; use warnings; ...

        And the output is the same as was obtained by moving the str package into a separate file. So the issue is (I think) not whether str is in a separate file, but whether it’s “visible” to the Perl compiler when the main code is parsed.

        Hope that helps,

        Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Re: Overloading Weirdness
by ikegami (Patriarch) on Jun 23, 2018 at 22:02 UTC

    I recommend that you use the perlbug command line tool to ask p5p to explain the outcome of the following cleaned up version of your program. Include the output.

    use strict; use warnings; use feature qw( say ); use Data::Dumper qw( Dumper ); BEGIN { package str; use strict; use warnings; use feature qw( say ); use overload '""' => \&stringify, '.' => \&concat; sub new { my $class = shift; bless([ @_ ], $class) } sub stringify { join '', @{ $_[0] } } sub concat { my ($s1, $s2, $inverted) = @_; return ref($s1)->new( $inverted ? ($s2,$s1) : ($s1,$s2) ); } $INC{""} = 1; } sub _dump { local $Data::Dumper::Indent = 0; local $Data::Dumper::Terse = 1; local $Data::Dumper::Sortkeys = 1; return Dumper(@_); } { my $str = str->new('foo'); my $good = "a $str a"; say "good: "._dump($good); my $bad; $bad = "a $str a"; say "bad: "._dump($bad); }


    good: bless( [bless( ['a ',bless( ['foo'], 'str' )], 'str' ),' a'], 's +tr' ) bad: 'a foo a'