Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Alternative to bytes::length()

by creamygoodness (Curate)
on Dec 23, 2009 at 01:04 UTC ( [id://814027]=perlquestion: print w/replies, xml ) Need Help??

creamygoodness has asked for the wisdom of the Perl Monks concerning the following question:

Greets,

I have often seen people badmouth the bytes pragma, but there's one thing I use it for: cheaply identifying empty strings with bytes::length()when the strings may be carrying the SVf_UTF8 flag. The length() function can be inefficient for such strings, because it must traverse the entire buffer counting characters:

marvin@smokey:~/perltest $ perl compare_length_efficiency.pl Rate utf8 bytes utf8 4.35/s -- -98% bytes 185/s 4154% -- marvin@smokey:~/perltest $

use strict; use warnings; use Benchmark qw( cmpthese ); # Make bytes:: functions available, but use character semantics. use bytes; no bytes; cmpthese( 100, { bytes => sub { my $smileys = "\x{263a}" x 10_000; chop($smileys) while bytes::length($smileys); }, utf8 => sub { my $smileys = "\x{263a}" x 10_000; chop($smileys) while length($smileys); }, } );

Is there an efficient alternative to bytes::length() for this use case elsewhere in core?

Replies are listed 'Best First'.
Re: Alternative to bytes::length()
by ikegami (Patriarch) on Dec 23, 2009 at 01:28 UTC

    Sometimes, I wish for a boolean context... (It would be great for grep too.) Or maybe the character length should be stored in string variables?

    I can't think of any replacement short of turning off the flag, calling length, then restoring it.

    What's the problem with the current solution?

    require bytes; sub emptystr(_) { no warnings 'uninitialized'; return !bytes::length($_[0]); }

    By the way,

    use bytes; no bytes;

    is equivalent to

    use bytes ();

    There's also similar

    require bytes;
      Or maybe the character length should be stored in string variables?

      I think the length in characters might be cached using MAGIC -- I know some UTF-8 stuff is.

      What's the problem with the current solution?

      There was just a post to p5p from someone who wanted to terminate the bytes pragma with extreme prejudice. I wanted to mention this use case.

        Do you have an example of this magic?

        That would be an argument for creating a new function, not for keeping bytes.

        Why would they eliminate the bytes pragma? What about those of us who aren't always manipulating character data and actually do care about the bytes themselves?
Re: Alternative to bytes::length()
by Anonymous Monk on Dec 23, 2009 at 01:44 UTC
    Why? Sounds premature to me
    #!/usr/bin/perl -- use strict; use warnings; use Benchmark qw( cmpthese ); # Make bytes:: functions available, but use character semantics. use bytes; no bytes; cmpthese( -3, { bytes => sub { my $smileys = "\x{263a}" x 10_000; chop($smileys) while bytes::length($smileys); }, utf8 => sub { my $smileys = "\x{263a}" x 10_000; chop($smileys) while length($smileys); }, substr => sub { my $smileys = "\x{263a}" x 10_000; chop($smileys) while ord substr($smileys,0,1); }, notnot => sub { my $smileys = "\x{263a}" x 10_000; chop($smileys) while !! $smileys; }, '!!bytes' => sub { my $smileys = "\x{263a}" x 10_000; chop($smileys) while !! bytes::length($smileys); }, '!!utf8' => sub { my $smileys = "\x{263a}" x 10_000; chop($smileys) while !! length $smileys; }, 'ne""' => sub { my $smileys = "\x{263a}" x 10_000; chop($smileys) while $smileys ne ""; }, } ); __END__ Rate substr utf8 !!utf8 bytes !!bytes ne"" notnot substr 3.75/s -- -3% -4% -97% -97% -99% -99% utf8 3.86/s 3% -- -1% -97% -97% -99% -99% !!utf8 3.90/s 4% 1% -- -97% -97% -99% -99% bytes 119/s 3066% 2972% 2942% -- -0% -71% -74% !!bytes 119/s 3081% 2987% 2956% 0% -- -71% -74% ne"" 406/s 10740% 10419% 10314% 242% 241% -- -10% notnot 451/s 11928% 11572% 11455% 280% 278% 11% --

      Seems like ne "" is what I was looking for. :)

      The !! construct won't work because certain strings with lengths can be false:

      marvin@smokey:~ $ perl -le 'print "true" if !!"0";' marvin@smokey:~ $ perl -le 'print "true" if !!"1";' true

      What did you mean by "premature", though?

        Seems like ne "" is what I was looking for. :)

        Be careful. Anonymonk's benchmark is conflating an aweful lot of other stuff in with the actual code you are concerned about.

        I believe (but I'm open to correction), this to be a far better benchmark, and it shows a radically different result. It might just set your mind at ease. (Or not!):

        #!/usr/bin/perl -- use strict; use warnings; use Benchmark qw( cmpthese ); # Make bytes:: functions available, but use character semantics. use bytes (); our $smileys = "\x{263a}" x 10_000; our $empty = "\x{263a}"; chop $empty; cmpthese -1, { bytes => q{ my $c=0; ( bytes::length($empty) or bytes::length($smileys) ) and ++$c +for 1 .. 1000; }, utf8 => q{ my $c=0; ( length($empty) or length($smileys) ) and ++$c for 1 .. 1000; }, ord => q{ my $c=0; ( ord( $empty ) or ord( $smileys ) ) and ++$c for 1 .. 1000; }, 'ne""' => q{ my $c=0; ( $empty ne '' or $smileys ne '' ) and ++$c for 1 .. 1000; }, }; __END__ C:\test>junk8 Rate bytes ord ne"" utf8 bytes 1379/s -- -72% -75% -76% ord 4992/s 262% -- -10% -13% ne"" 5566/s 304% 12% -- -3% utf8 5757/s 317% 15% 3% --

        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://814027]
Approved by toolic
Front-paged by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (6)
As of 2024-03-29 13:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found