Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

Size and anatomy of an HTTP response

by Discipulus (Abbot)
on Dec 15, 2010 at 11:04 UTC ( #877238=perlquestion: print w/replies, xml ) Need Help??

Discipulus has asked for the wisdom of the Perl Monks concerning the following question:

Hello wise ones,
I'm exercising me on LWP and HTTP::* modules writing a tool aiming to timing the response of the webserver in detail.

The tool is htmlpage orientented; meaning that I will consider and discern the different elements constituing the html page resulting from the request. IE I will time separately the reception of the body and the content included in it.

I was in first approch relaing on the content_lenght header but I found it was not received from every webserver so (now the questions):

1) can I safely assume that $size_body = length ($response->content) ?

Now start the confusion: length says: "Returns the length in characters of the value of EXPR." while the camel book states: "Returns the length in bytes of the scalar value val.".
Following this little difference I discovered the bytes pragma (in this page I learned that  chr(400) is one character but two bytes) and new question born:

2)Considering now I'm working with unpredictable html output I have to do this esoteric operation counting the bytes I receive back  $size_body = bytes::length ($response->content)? and:

3)I have to consider also the content type and the encoding (returned in headers) to calculate the more accurate size in bytes (as definetively data come to me in bytes over the network..) of the response content?

4)to be accurate I have to time and size the response code and the message received back before headers, then headers, then the body and also (the response code and message too?) of any embedded element in the page?

thanks for the attention and for eventual answers


there are no rules, there are no thumbs..

Replies are listed 'Best First'.
Re: Size and anatomy of an HTTP response
by moritz (Cardinal) on Dec 15, 2010 at 11:14 UTC
    Now start the confusion: length says: "Returns the length in characters of the value of EXPR." while the camel book states: "Returns the length in bytes of the scalar value val.".

    For a string that hasn't been decoded, Perl assumes Latin-1 as the encoding. And in Latin-1, the number of bytes and number of characters is the same.

    Now if you took decoded_content, you'd have to be careful, but I don't think that content itself decodes anything

Re: Size and anatomy of an HTTP response
by ELISHEVA (Prior) on Dec 15, 2010 at 11:48 UTC

    Some of the confusion may be due to history. Prior to Perl 5.8 strings were simply bytes so length could only return the bytes. Support for character encodings was introduced in 5.8 (so says this: Encode - I'm not at all an encoding guru, but your question got me curious).

    From what I understand of that document, if the string is marked as utf8 (a bit set in the C guts of Perl), it's length will be counted as characters because it knows to check if each byte is a complete or partial character. Otherwise it's length is counted as bytes. You can see the flag value using _is_utf8. It is normally set automatically to your input stream's encoding when you read in characters, but if you aren't sure about the history of the string you can use that function to check its status. For more information, see the section on messing with Perl's internals in Encode.

    There are also methods for explicitly selecting whether your string will be read as bytes or utf8 octets and for chosing the rules for converting back and forth from raw bytes to utf8 - see the same document for encode, decode and from_to.

    Update: added more information about controlling the utf8 status.

        More precisely length() always returns what it thinks are the number of characters in the string. This "thinking" relies on the value of the utf8 flag. The reply you linked to refered to a "unicode string", i.e. one with its unicode flag set.

        If the utf8 flag is set, it assumes each byte is an octet and glues octets together into single characters as needed, so you might have bytes = characters or not. If the utf8 flag is NOT set, then it counts pure bytes on the assumption that there is a one-to-one relationship between bytes and characters. In that case there is no difference between the byte count and the character count. If your utf8 octets are all in the ascii range you will never notice the difference and byte count will equal character count, but if for some reason you have a string full of utf8 octets and the utf8 flag gets switched off (perhaps you opened a stream raw mode but the file was filled with non-ascii utf8 octets?), length will return the number of bytes NOT the number of characters.

        Here is a quick example of the difference a flag makes. Nothing has changed in the content of $s. Only the utf8 bit has been changed, and presto the length goes from 1 to 2.

        use Encode; my $s=chr(0x0160); printf "chr=<%s> utf8-flag=%s length=%d\n" , $s, Encode::is_utf8($s)?'yes':'no', length($s); #outputs: chr=<?> utf8-flag=yes length=1 Encode::_utf8_off($s); printf "chr=<%s> utf8-flag=%s length=%d\n" , $s, Encode::is_utf8($s)?'yes':'no', length($s); #outputs: chr=<?> utf8-flag=no length=2
Re: Size and anatomy of an HTTP response
by Anonymous Monk on Dec 15, 2010 at 11:43 UTC
    HTTP::Message/HTTP::Response/HTTP::Response, ->content should always encoded in bytes ( octets ie 8-bit)
    $ perl -MHTTP::Message -e" $a = HTTP::Message->new; $a->content( chr(5 +00) )" HTTP::Message content must be bytes at -e line 1 $ perl -e"print utf8::is_utf8( chr(500) )" 1
    It relies on utf8::downgrade to check
    $success = utf8::downgrade($string[, FAIL_OK])
    Converts in-place the internal octet sequence in *UTF-X* to the equivalent octet sequence in the native encoding (Latin-1 or EBCDIC). *$string* already encoded as native 8 bit does no harm. Can be used to make sure that the UTF-8 flag is off, e.g. when you want to make sure that the substr() or length() function works with the usually faster byte algorithm.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://877238]
Approved by moritz
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (9)
As of 2021-04-21 14:14 GMT
Find Nodes?
    Voting Booth?

    No recent polls found