Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re: Alternative to bytes::length()

by ikegami (Patriarch)
on Dec 23, 2009 at 01:28 UTC ( [id://814035]=note: print w/replies, xml ) Need Help??


in reply to Alternative to bytes::length()

Sometimes, I wish for a boolean context... (It would be great for grep too.) Or maybe the character length should be stored in string variables?

I can't think of any replacement short of turning off the flag, calling length, then restoring it.

What's the problem with the current solution?

require bytes; sub emptystr(_) { no warnings 'uninitialized'; return !bytes::length($_[0]); }

By the way,

use bytes; no bytes;

is equivalent to

use bytes ();

There's also similar

require bytes;

Replies are listed 'Best First'.
Re^2: Alternative to bytes::length()
by creamygoodness (Curate) on Dec 23, 2009 at 01:38 UTC
    Or maybe the character length should be stored in string variables?

    I think the length in characters might be cached using MAGIC -- I know some UTF-8 stuff is.

    What's the problem with the current solution?

    There was just a post to p5p from someone who wanted to terminate the bytes pragma with extreme prejudice. I wanted to mention this use case.

      Do you have an example of this magic?

      That would be an argument for creating a new function, not for keeping bytes.

        Actually, simply calling length on a scalar with UTF8=1 adds the magic.
        >perl -MDevel::Peek -e"Dump $_=chr(0x2660)x100; length $_; Dump $_" SV = PV(0x2379ec) at 0x1845eec REFCNT = 1 FLAGS = (POK,pPOK,UTF8) PV = 0x184c5ec "..."\0 [UTF8 "..."] CUR = 300 LEN = 304 SV = PVMG(0x18242cc) at 0x1845eec REFCNT = 1 FLAGS = (SMG,POK,pPOK,UTF8) IV = 0 NV = 0 PV = 0x184c5ec "..."\0 [UTF8 "..."] CUR = 300 LEN = 304 MAGIC = 0x1824e64 MG_VIRTUAL = &PL_vtbl_utf8 MG_TYPE = PERL_MAGIC_utf8(w) MG_LEN = 100 <---------- char length

        It's a pity that actions such as chop, appending a UTF8=0 string, etc void the count instead of updating it.

        Note that $_ eq '' doesn't add the magic, so not only is it faster, it uses less memory.

        Looks like you found the caching mechanism. From perlguts:
        w PERL_MAGIC_utf8 vtbl_utf8 UTF-8 length+offset cac +he

        As for keeping bytes... meh, my attachment to the bytes pragma extended only to that use case, as the efficiency of CORE::length() with SVf_UTF8 scalars is a bummer. I'm not even going to bother posting to p5p now that my concern has been addressed another way.

      Why would they eliminate the bytes pragma? What about those of us who aren't always manipulating character data and actually do care about the bytes themselves?

        Strings can contain bytes. You don't have to do anything special to work with bytes. use bytes; has nothing to do with manipulating bytes.

        If you need to manipulate the internal string format to optimize or to work with some buggy XS,
        You want utf8::upgrade or utf8::downgrade.
        If you need you need to encode to UTF-8 or decode from UTF-8,
        You want utf8::encode, Encode::encode, utf8::decode or Encode::decode.

        The person probably wants to eliminate it because of that very misconception you expressed. But don't worry, if anything is ever done, it would still be available on CPAN.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://814035]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (5)
As of 2024-04-24 03:38 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found