sam_bakki has asked for the wisdom of the Perl Monks concerning the following question:
Dear Monks
I have a perl script to download data from HTTPS site. I was using Crypt::SSLeay. My script is working fine, I could properly download full data (csv file) from the server.
I thought of give a try with LWP's inbuilt IO::Socket::SSL.
Actually I am using WWW::Mechanize in my script, Script failed in
$mech->response()->decoded_content() phase. I tried to debug more and I found that it could not deflate the gzip compress data sent from server.
Surprised. I thought to debug more and disabled the compression using $mech->add_header('Accept-Encoding' => '');
Now, I could see the data comes from the server but its not complete data, i see only first few bytes. I examine the HTTP::Response headers, I find
'client-transfer-encoding' => [ 'chunked' ]
Looks like the server is sending the chunked data to me. LWP / IO::Socket::SSL could not work with "chunked" data transfer. So gzip content decode fails.
when I force to use Crypt::SSLeay like below,
use Crypt::SSLeay; use Net::SSL; use WWW::Mechanize; .... $ENV{PERL_NET_HTTPS_SSL_SOCKET_CLASS} = "Net::SSL"; $mech = WWW::Mechanize->new(autocheck =>1, noproxy =>1,ssl_opts => { ' +verify_hostname' => 0 }); ...
I see full data comes to me from server. I still see "chunked" header but its properly handled by Net::SSL / Crypt::SSleay .
Q: Does any one face this issue? Perl LWP Can handle "Chunked" data transfer over SSL?. Thanks for your time.
Update: Added 2 test scripts to demonstrate the problem
One uses the Net::SSL and downloads data properly from Server
Other uses IO::Socket::SSL and downloads only first chunk (I think) from server and quits.
To show the differences b/w downloads, I have shown MD5 sum and file sizes.
My environment
OS: Windows 7 , x86_64 bit
Perl: Active Perl , perl 5, version 20, subversion 1 (v5.20.1) built for MSWin32-x86-multi-thread-64int
Note: I saw the same behavior in Active Perl 5.10, 5.14, 5.16 and 5.18
Script 1 - Using Net::SSL and Crypt::SSLeay - Working
#WORKING HTTPS DOWNLOAD Using Net::SSL in Windows + Active Perl use strict; use warnings; use Crypt::SSLeay; use Net::SSL; use WWW::Mechanize; use HTTP::Cookies; use HTTP::Message; use Digest::MD5; use File::Slurp; use Data::Dumper; #Globals $|=1; #Force LWP to use Net::SSL instead of IO::Socket::SSL $ENV{PERL_NET_HTTPS_SSL_SOCKET_CLASS} = "Net::SSL"; $ENV{PERL_LWP_SSL_VERIFY_HOSTNAME} = 0; delete $ENV{https_proxy} if exists $ENV{https_proxy}; delete $ENV{http_proxy} if exists $ENV{http_proxy}; #Variables my $browser = ""; my $url = 'https://developer.apple.com/standards/qtff-2001.pdf'; my $pageContent = ''; my $fileName = ''; my $md5Obj = Digest::MD5->new(); print "\n USING Net::SSL"; #Init Mechanize $browser = WWW::Mechanize->new(autocheck =>1, noproxy=>1, ssl_opts => +{ 'verify_hostname' => 0 }); # Add cookie jar $browser->cookie_jar(HTTP::Cookies->new()); $browser->agent_alias( 'Linux Mozilla'); $browser->add_header('Accept-Encoding'=>scalar HTTP::Message::decodabl +e()); $browser->timeout(120); #Get URL $browser->get($url); if ($browser->success()) { print "\n INFO: Got URL: $url"; $fileName = $browser->response()->filename(); print "\n INFO: Save in File: $fileName"; $browser->save_content($fileName); #Calculate MD5 sum $pageContent = read_file( $fileName, binmode => ':raw' ); print "\n INFO: $fileName Size: ", length($pageContent)/1024," KB" +; $md5Obj->add($pageContent); print "\n INFO: $fileName MD5 Sum: ", $md5Obj->hexdigest(); undef $md5Obj; } else { print "\n ERROR: Can't get URL $url ",$browser->status(); } print "\n\n INFO: ********************* DUMP ********************"; print "\n",Dumper(\$browser); print "\n INFO: ********************* DUMP ********************"; exit 0;
Output1:
USING Net::SSL INFO: Got URL: https://developer.apple.com/standards/qtff-2001.pdf INFO: Save in File: qtff-2001.pdf INFO: qtff-2001.pdf Size: 5465.48046875 KB INFO: qtff-2001.pdf MD5 Sum: d1aee95cc06d529e67b707257a5cf3eb
Script 2 - Using IO::Socket::SSL - Not Working. Only part of the PDF file is downloaded
#NOT WORKING HTTPS DOWNLOAD Using IO::Socket::SSL in Windows + Active +Perl use strict; use warnings; use IO::Socket::SSL; use WWW::Mechanize; use HTTP::Cookies; use HTTP::Message; use Digest::MD5; use File::Slurp; use Data::Dumper; #Globals $|=1; $ENV{PERL_LWP_SSL_VERIFY_HOSTNAME} = 0; #Variables my $browser = ""; my $url = 'https://developer.apple.com/standards/qtff-2001.pdf'; my $pageContent = ''; my $fileName = ''; my $md5Obj = Digest::MD5->new(); print "\n USING IO::Socket::SSL"; #Init Mechanize $browser = WWW::Mechanize->new(autocheck =>1, noproxy=>1,ssl_opts => { + 'verify_hostname' => 0 }); # Add cookie jar $browser->cookie_jar(HTTP::Cookies->new()); $browser->agent_alias( 'Linux Mozilla'); $browser->add_header('Accept-Encoding'=>scalar HTTP::Message::decodabl +e()); $browser->timeout(120); #Get URL $browser->get($url); if ($browser->success()) { print "\n INFO: Got URL: $url"; $fileName = $browser->response()->filename(); print "\n INFO: Save in File: $fileName"; $browser->save_content($fileName); #Calculate MD5 sum $pageContent = read_file( $fileName, binmode => ':raw' ); print "\n INFO: $fileName Size: ", length($pageContent)/1024," KB" +; $md5Obj->add($pageContent); print "\n INFO: $fileName MD5 Sum: ", $md5Obj->hexdigest(); undef $md5Obj; } else { print "\n ERROR: Can't get URL $url ",$browser->status(); } print "\n\n INFO: ********************* DUMP ********************"; print "\n",Dumper(\$browser); print "\n INFO: ********************* DUMP ********************"; exit 0;
Output2:
USING IO::Socket::SSL INFO: Got URL: https://developer.apple.com/standards/qtff-2001.pdf INFO: Save in File: qtff-2001.pdf INFO: qtff-2001.pdf Size: 6.66796875 KB INFO: qtff-2001.pdf MD5 Sum: 4049c364f7332790c3abe548d6a4297c
I did not paste the Dumper output because its huge and not properly copied to browser because of the binary contents.
Please help me to understand why scripts behave differently? I was thinking, its chunking issues ...
Thanks & Regards,
Bakkiaraj M
My Perl Gtk2 technology demo project - http://code.google.com/p/saaral-soft-search-spider/ , contributions are welcome.
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: Perl LWP Can handle client-transfer-encoding = chunked encoding?
by FloydATC (Deacon) on Jan 16, 2015 at 09:02 UTC | |
by sam_bakki (Pilgrim) on Jan 16, 2015 at 09:38 UTC | |
by FloydATC (Deacon) on Jan 16, 2015 at 10:01 UTC | |
by noxxi (Pilgrim) on Jan 17, 2015 at 08:07 UTC | |
Re: Perl LWP Can handle client-transfer-encoding = chunked encoding?
by noxxi (Pilgrim) on Jan 16, 2015 at 09:45 UTC | |
by sam_bakki (Pilgrim) on Jan 19, 2015 at 14:07 UTC | |
by noxxi (Pilgrim) on Jan 22, 2015 at 21:17 UTC | |
by sam_bakki (Pilgrim) on Jan 16, 2015 at 15:45 UTC |