Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re^2: Perl LWP Can handle client-transfer-encoding = chunked encoding?

by sam_bakki (Pilgrim)
on Jan 19, 2015 at 14:07 UTC ( #1113756=note: print w/replies, xml ) Need Help??


in reply to Re: Perl LWP Can handle client-transfer-encoding = chunked encoding?
in thread Perl LWP Can handle client-transfer-encoding = chunked encoding?

Hi noxxi

As you have suggested, I have written a two scripts,
One uses the Net::SSL and downloads data properly from Server
Other uses IO::Socket::SSL and downloads only first chunk (I think) from server and quits.

To show the differences b/w downloads, I have shown MD5 sum and file sizes.

My environment
OS: Windows 7 , x86_64 bit Perl: Active Perl , perl 5, version 20, subversion 1 (v5.20.1) built for MSWin32-x86-multi-thread-64int

Note: I saw the same behavior in Active Perl 5.10, 5.14, 5.16 and 5.18

Script 1 - Using Net::SSL and Crypt::SSLeay - Working

#WORKING HTTPS DOWNLOAD Using Net::SSL in Windows + Active Perl use strict; use warnings; use Crypt::SSLeay; use Net::SSL; use WWW::Mechanize; use HTTP::Cookies; use HTTP::Message; use Digest::MD5; use File::Slurp; use Data::Dumper; #Globals $|=1; #Force LWP to use Net::SSL instead of IO::Socket::SSL $ENV{PERL_NET_HTTPS_SSL_SOCKET_CLASS} = "Net::SSL"; $ENV{PERL_LWP_SSL_VERIFY_HOSTNAME} = 0; delete $ENV{https_proxy} if exists $ENV{https_proxy}; delete $ENV{http_proxy} if exists $ENV{http_proxy}; #Variables my $browser = ""; my $url = 'https://developer.apple.com/standards/qtff-2001.pdf'; my $pageContent = ''; my $fileName = ''; my $md5Obj = Digest::MD5->new(); print "\n USING Net::SSL"; #Init Mechanize $browser = WWW::Mechanize->new(autocheck =>1, noproxy=>1, ssl_opts => +{ 'verify_hostname' => 0 }); # Add cookie jar $browser->cookie_jar(HTTP::Cookies->new()); $browser->agent_alias( 'Linux Mozilla'); $browser->add_header('Accept-Encoding'=>scalar HTTP::Message::decodabl +e()); $browser->timeout(120); #Get URL $browser->get($url); if ($browser->success()) { print "\n INFO: Got URL: $url"; $fileName = $browser->response()->filename(); print "\n INFO: Save in File: $fileName"; $browser->save_content($fileName); #Calculate MD5 sum $pageContent = read_file( $fileName, binmode => ':raw' ); print "\n INFO: $fileName Size: ", length($pageContent)/1024," KB" +; $md5Obj->add($pageContent); print "\n INFO: $fileName MD5 Sum: ", $md5Obj->hexdigest(); undef $md5Obj; } else { print "\n ERROR: Can't get URL $url ",$browser->status(); } print "\n\n INFO: ********************* DUMP ********************"; print "\n",Dumper(\$browser); print "\n INFO: ********************* DUMP ********************"; exit 0;

Output1:


  USING Net::SSL
 INFO: Got URL: https://developer.apple.com/standards/qtff-2001.pdf
 INFO: Save in File: qtff-2001.pdf
 INFO: qtff-2001.pdf Size: 5465.48046875 KB
 INFO: qtff-2001.pdf MD5 Sum: d1aee95cc06d529e67b707257a5cf3eb

Script 2 - Using IO::Socket::SSL - Not Working. Only part of the PDF file is downloaded

#NOT WORKING HTTPS DOWNLOAD Using IO::Socket::SSL in Windows + Active +Perl use strict; use warnings; use IO::Socket::SSL; use WWW::Mechanize; use HTTP::Cookies; use HTTP::Message; use Digest::MD5; use File::Slurp; use Data::Dumper; #Globals $|=1; $ENV{PERL_LWP_SSL_VERIFY_HOSTNAME} = 0; #Variables my $browser = ""; my $url = 'https://developer.apple.com/standards/qtff-2001.pdf'; my $pageContent = ''; my $fileName = ''; my $md5Obj = Digest::MD5->new(); print "\n USING IO::Socket::SSL"; #Init Mechanize $browser = WWW::Mechanize->new(autocheck =>1, noproxy=>1,ssl_opts => { + 'verify_hostname' => 0 }); # Add cookie jar $browser->cookie_jar(HTTP::Cookies->new()); $browser->agent_alias( 'Linux Mozilla'); $browser->add_header('Accept-Encoding'=>scalar HTTP::Message::decodabl +e()); $browser->timeout(120); #Get URL $browser->get($url); if ($browser->success()) { print "\n INFO: Got URL: $url"; $fileName = $browser->response()->filename(); print "\n INFO: Save in File: $fileName"; $browser->save_content($fileName); #Calculate MD5 sum $pageContent = read_file( $fileName, binmode => ':raw' ); print "\n INFO: $fileName Size: ", length($pageContent)/1024," KB" +; $md5Obj->add($pageContent); print "\n INFO: $fileName MD5 Sum: ", $md5Obj->hexdigest(); undef $md5Obj; } else { print "\n ERROR: Can't get URL $url ",$browser->status(); } print "\n\n INFO: ********************* DUMP ********************"; print "\n",Dumper(\$browser); print "\n INFO: ********************* DUMP ********************"; exit 0;

Output2:


  USING IO::Socket::SSL
 INFO: Got URL: https://developer.apple.com/standards/qtff-2001.pdf
 INFO: Save in File: qtff-2001.pdf
 INFO: qtff-2001.pdf Size: 6.66796875 KB
 INFO: qtff-2001.pdf MD5 Sum: 4049c364f7332790c3abe548d6a4297c

I did not paste the Dumper output because its huge and not properly copied to browser because of the binary contents.

Please help me to understand why scripts behave differently? I was thinking, its chunking issues ...

Thanks & Regards,
Bakkiaraj M
My Perl Gtk2 technology demo project - http://code.google.com/p/saaral-soft-search-spider/ , contributions are welcome.

Replies are listed 'Best First'.
Re^3: Perl LWP Can handle client-transfer-encoding = chunked encoding?
by noxxi (Pilgrim) on Jan 22, 2015 at 21:17 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1113756]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (6)
As of 2020-11-27 11:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?