comment on

Some of the confusion may be due to history. Prior to Perl 5.8 strings were simply bytes so length could only return the bytes. Support for character encodings was introduced in 5.8 (so says this: Encode - I'm not at all an encoding guru, but your question got me curious).

From what I understand of that document, if the string is marked as utf8 (a bit set in the C guts of Perl), it's length will be counted as characters because it knows to check if each byte is a complete or partial character. Otherwise it's length is counted as bytes. You can see the flag value using _is_utf8. It is normally set automatically to your input stream's encoding when you read in characters, but if you aren't sure about the history of the string you can use that function to check its status. For more information, see the section on messing with Perl's internals in Encode.

There are also methods for explicitly selecting whether your string will be read as bytes or utf8 octets and for chosing the rules for converting back and forth from raw bytes to utf8 - see the same document for encode, decode and from_to.

Update: added more information about controlling the utf8 status.

In reply to Re: Size and anatomy of an HTTP response by ELISHEVA
in thread Size and anatomy of an HTTP response by Discipulus

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Clear questions and runnable code get the best and fastest answer
	PerlMonks