Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re^3: Length and Chomp ??

by biohisham (Priest)
on Aug 22, 2009 at 18:53 UTC ( [id://790605]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Length and Chomp ??
in thread Length and Chomp ??

Exact and precise it is when you said that length does count every character including control characters. But citing the same perldoc for length I have "If the EXPR is in Unicode, you will get the number of characters, not the number of bytes." which means, if we look at it the other way around and negate this statement we would reach to "if the EXPR was otherwise not in Unicode, a strong implication is embedded that we'd get its length in bytes instead of characters".

Update: I had the notion that one character can be represented by one byte in Programming, this has been more solidified after afoken gracious contribution underneath.

Hence, what you said, that we'd get the number of characters holds true for Unicode values, and what I replied when I said that length is byte length for characters not in Unicode holds true too since characters are bytes for those values not in unicode :).


Excellence is an Endeavor of Persistence. Chance Favors a Prepared Mind.

Replies are listed 'Best First'.
Re^4: Length and Chomp ??
by afoken (Chancellor) on Aug 23, 2009 at 19:22 UTC

    Some bean counting:

    With a Unicode argument, length returns the number of characters in the argument. Unicode has the (no so) new / unusual / odd property that a character may be represented by more than one byte.

    With a non-Unicode / pre-Unicode / legacy encoding argument, length still returns the number of characters in the argument. Those legacy encodings have the old / usual / familiar property that a character is represented by exactly one byte.

    So, there is no need to remember any special cases. length always returns the character count.

    Before Unicode support was added to Perl, there was no need to distinguish between byte and character, because both were equal. And as long as you don't work with Unicode, they still are. The quote from perlfunc, "if the EXPR is in Unicode, you will get the number of characters, not the number of bytes", is a hint that bytes and characters are different things when you work with Unicode, nothing more, nothing less.

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://790605]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (5)
As of 2024-04-18 03:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found