Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

LWP $response->decoded_content different on different servers ?

by Transalp (Sexton)
on Apr 29, 2015 at 09:31 UTC ( [id://1125097]=perlquestion: print w/replies, xml ) Need Help??

Transalp has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

I have a test script (below) that works well on one server, but not on two other servers. Does anyone have an idea what might cause this ?

The script below contains two test URLs in @tests using a Thai font. The content of both test URLs are reported to be in UTF8 according to the 'is_utf8' test after retrieval, but only one server prints the content of $body of both URLs in a readable format when run in a browser.

The other two servers print 'garbled' instead of Thai fonts for the first URL, but work fine on the second URL. When I manually set the browser to switch from Character Encoding Unicode to use 'Thai' instead on these two servers, the garbled Thai text does become readable.

The problematic first URL does contain a double content-type meta tag with different charsets in the HTML source, but this is appearently not an issue on the one server that does work well.

Next week the one server that is working well will be taken off line so I'm trying to find the difference between the servers as soon as possible.

All three servers use CentOS 6.6, Perl version v5.10.1 and LWP Bundle::LWP 5.810

Test script:
#!/usr/bin/perl use strict; use warnings; use LWP::UserAgent; use Encode qw/ is_utf8 /; my $agent = LWP::UserAgent->new(); print "Content-type: text/html; charset=utf-8\n\n"; ##two websites using Thai fonts my @tests = ( 'http://www.ranks.nl/images/test/intrend4kids.htm', 'http://www.readyplanet.com/', ); foreach my $uri (@tests) { eval { printf "test: %s <br>", $uri; my $response = $agent->get($uri); my $dc = $response->decoded_content( raise_error => 1 ); printf "is decoded content utf8? %s <br>", is_utf8($dc); print qq~ <br> <textarea cols=80 rows=20>$dc</textarea> <br> ~; }; if ($@) { print "decode failed: $@\n"; } print "\n"; }

<Added>: the same problem occurs when I use curl instead of LWP </Added>

Replies are listed 'Best First'.
Re: LPW $response->decoded_content different on different servers ?
by Anonymous Monk on Apr 29, 2015 at 10:08 UTC
    Compare ddumps of the raw responses, bytes, post them here

      >Compare ddumps

      Thanks for the suggestion. I'm not familiar with ddump yet...this is a part of something called GHC ? That is a little over my head it seems. I would probably need a system administrators help installing that so I prefer looking at other methods first.

      Running variations of the test script I found the following that might provide additional clues:

      When I add the line below $charset does return the correct charset for the problem URL on the 'good' server (windows-874):

      my $charset = $response->content_charset;

      On the other two servers $charset remains empty for some reason. Bundle::LWP (which includes HTTP::Message that provides the content_charset method ) is identical on the three servers though.

      When I specify 'windows-874' as a default_charset or charset the test works on all three servers as expected, but the test URLs are just example cases and in the live situation the charset is unknown.

      ##these lines would both work on all three servers my $dc = $response->decoded_content( raise_error => 1 , charset => 'wi +ndows-874'); my $dc = $response->decoded_content( raise_error => 1 , default_charse +t => 'windows-874');

        Thanks for the suggestion. I'm not familiar with ddump yet...this is a part of something called GHC ? That is a little over my head it seems. I would probably need a system administrators help installing that so I prefer looking at other methods first.

        Um, no, I was on flip phone, so hard to type, I meant Data::Dump::dd()umpering to visualize your data (lesson courtesy of Basic debugging checklist

        I guess I should have said hexdump :)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1125097]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (8)
As of 2024-04-18 10:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found