sam_bakki has asked for the wisdom of the Perl Monks concerning the following question:
Dear Monks,
While downloading data from HTTPS URL, I see different results in Net::SLL and IO::Socket::SSL. Basically, IO::Socket::SSL is not downloading full data.
To show whats really happening, I have two scripts below,
One uses the Net::SSL and downloads data properly from Server
Other uses IO::Socket::SSL and downloads only first chunk (I think) from server and quits.
To show the differences b/w downloads, I have shown MD5 sum and file sizes.
My environment
OS: Windows 7 , x86_64 bit
Perl: Active Perl , perl 5, version 20, subversion 1 (v5.20.1) built for MSWin32-x86-multi-thread-64int
Note: I saw the same behavior in Active Perl 5.10, 5.14, 5.16 and 5.18
Script 1 - Using Net::SSL and Crypt::SSLeay - Working
#WORKING HTTPS DOWNLOAD Using Net::SSL in Windows + Active Perl use strict; use warnings; use Crypt::SSLeay; use Net::SSL; use WWW::Mechanize; use HTTP::Cookies; use HTTP::Message; use Digest::MD5; use File::Slurp; use Data::Dumper; use Devel::ModuleDumper; #Globals $|=1; #Force LWP to use Net::SSL instead of IO::Socket::SSL $ENV{PERL_NET_HTTPS_SSL_SOCKET_CLASS} = "Net::SSL"; $ENV{PERL_LWP_SSL_VERIFY_HOSTNAME} = 0; delete $ENV{https_proxy} if exists $ENV{https_proxy}; delete $ENV{http_proxy} if exists $ENV{http_proxy}; #Variables my $browser = ""; my $url = 'https://developer.apple.com/standards/qtff-2001.pdf'; my $pageContent = ''; my $fileName = ''; my $md5Obj = Digest::MD5->new(); print "\n USING Net::SSL"; #Init Mechanize $browser = WWW::Mechanize->new(autocheck =>1, noproxy=>1, ssl_opts => +{ 'verify_hostname' => 0 }); # Add cookie jar $browser->cookie_jar(HTTP::Cookies->new()); $browser->agent_alias( 'Linux Mozilla'); $browser->add_header('Accept-Encoding'=>scalar HTTP::Message::decodabl +e()); $browser->timeout(120); #Get URL $browser->get($url); if ($browser->success()) { print "\n INFO: Got URL: $url"; $fileName = $browser->response()->filename(); print "\n INFO: Save in File: $fileName"; $browser->save_content($fileName); #Calculate MD5 sum $pageContent = read_file( $fileName, binmode => ':raw' ); print "\n INFO: $fileName Size: ", length($pageContent)/1024," KB" +; $md5Obj->add($pageContent); print "\n INFO: $fileName MD5 Sum: ", $md5Obj->hexdigest(); undef $md5Obj; } else { print "\n ERROR: Can't get URL $url ",$browser->status(); } print "\n\n INFO: ********************* DUMP ********************"; print "\n",Dumper(\$browser); print "\n INFO: ********************* DUMP ********************"; exit 0;
Output1:
USING Net::SSL INFO: Got URL: https://developer.apple.com/standards/qtff-2001.pdf INFO: Save in File: qtff-2001.pdf INFO: qtff-2001.pdf Size: 5465.48046875 KB INFO: qtff-2001.pdf MD5 Sum: d1aee95cc06d529e67b707257a5cf3eb Loaded Modules ------------------- Carp 1.3301 Compress::Raw::Bzip2 2.068 Compress::Raw::Zlib 2.068 Compress::Zlib 2.068 Crypt::SSLeay 0.72 Crypt::SSLeay::CTX none Crypt::SSLeay::MainContext none Crypt::SSLeay::X509 none Data::Dumper 2.154 Digest::base 1.16 Digest::MD5 2.53 Encode 2.67 Encode::Alias 2.18 Encode::Config 2.05 Encode::Encoding 2.07 Errno 1.2003 Exporter 5.70 Exporter::Heavy 5.70 Fcntl 1.11 File::Glob 1.23 File::GlobMapper 1.000 File::Slurp 9999.19 HTML::Entities 3.69 HTML::Form 6.03 HTML::Parser 3.71 HTML::PullParser 3.57 HTML::Tagset 3.20 HTML::TokeParser 3.69 HTTP::Config 6.00 HTTP::Cookies 6.01 HTTP::Cookies::Netscape 6.00 HTTP::Date 6.02 HTTP::Headers 6.05 HTTP::Headers::Util 6.03 HTTP::Message 6.06 HTTP::Request 6.00 HTTP::Request::Common 6.04 HTTP::Response 6.04 HTTP::Status 6.03 IO 1.31 IO::Compress::Adapter::Deflate 2.068 IO::Compress::Base 2.068 IO::Compress::Base::Common 2.068 IO::Compress::Gzip 2.068 IO::Compress::Gzip::Constants 2.068 IO::Compress::RawDeflate 2.068 IO::Compress::Zlib::Constants 2.068 IO::Compress::Zlib::Extra 2.068 IO::File 1.16 IO::Handle 1.35 IO::Seekable 1.1 IO::Socket 1.37 IO::Socket::INET 1.35 IO::Socket::IP 0.35 IO::Socket::UNIX 1.26 IO::Uncompress::Adapter::Bunzip2 2.068 IO::Uncompress::Adapter::Inflate 2.068 IO::Uncompress::Base 2.068 IO::Uncompress::Bunzip2 2.068 IO::Uncompress::Gunzip 2.068 IO::Uncompress::Inflate 2.068 IO::Uncompress::RawInflate 2.068 List::Util 1.41 LWP 6.08 LWP::MemberMixin none LWP::Protocol 6.06 LWP::Protocol::http none LWP::Protocol::https 6.06 LWP::UserAgent 6.06 MIME::Base64 3.14 Net::HTTP 6.07 Net::HTTP::Methods 6.07 Net::HTTPS 6.04 Net::SSL 2.86 POSIX 1.38_03 Scalar::Util 1.41 SelectSaver 1.02 Socket 2.016 Storable 2.51 Symbol 1.07 Tie::Hash 1.05 Time::Local 1.2300 URI 1.65 URI::Escape 3.31 URI::http none URI::https none URI::_generic none URI::_query none URI::_server none WWW::Mechanize 1.73
Script 2 - Using IO::Socket::SSL - Not Working. Only part of the PDF file is downloaded
#NOT WORKING HTTPS DOWNLOAD Using IO::Socket::SSL in Windows + Active +Perl use strict; use warnings; use IO::Socket::SSL; use WWW::Mechanize; use HTTP::Cookies; use HTTP::Message; use Digest::MD5; use File::Slurp; use Data::Dumper; use Devel::ModuleDumper; #Globals $|=1; $ENV{PERL_LWP_SSL_VERIFY_HOSTNAME} = 0; #Variables my $browser = ""; my $url = 'https://developer.apple.com/standards/qtff-2001.pdf'; my $pageContent = ''; my $fileName = ''; my $md5Obj = Digest::MD5->new(); print "\n USING IO::Socket::SSL"; #Init Mechanize $browser = WWW::Mechanize->new(autocheck =>1, noproxy=>1,ssl_opts => { + 'verify_hostname' => 0 }); # Add cookie jar $browser->cookie_jar(HTTP::Cookies->new()); $browser->agent_alias( 'Linux Mozilla'); $browser->add_header('Accept-Encoding'=>scalar HTTP::Message::decodabl +e()); $browser->timeout(120); #Get URL $browser->get($url); if ($browser->success()) { print "\n INFO: Got URL: $url"; $fileName = $browser->response()->filename(); print "\n INFO: Save in File: $fileName"; $browser->save_content($fileName); #Calculate MD5 sum $pageContent = read_file( $fileName, binmode => ':raw' ); print "\n INFO: $fileName Size: ", length($pageContent)/1024," KB" +; $md5Obj->add($pageContent); print "\n INFO: $fileName MD5 Sum: ", $md5Obj->hexdigest(); undef $md5Obj; } else { print "\n ERROR: Can't get URL $url ",$browser->status(); } print "\n\n INFO: ********************* DUMP ********************"; print "\n",Dumper(\$browser); print "\n INFO: ********************* DUMP ********************"; exit 0;
Output2:
USING IO::Socket::SSL INFO: Got URL: https://developer.apple.com/standards/qtff-2001.pdf INFO: Save in File: qtff-2001.pdf INFO: qtff-2001.pdf Size: 6.66796875 KB INFO: qtff-2001.pdf MD5 Sum: 4049c364f7332790c3abe548d6a4297c Loaded Modules ---------------- ActivePerl::Config none Carp 1.3301 Compress::Raw::Bzip2 2.068 Compress::Raw::Zlib 2.068 Compress::Zlib 2.068 Cwd 3.48 Data::Dumper 2.154 Digest::base 1.16 Digest::MD5 2.53 Encode 2.67 Encode::Alias 2.18 Encode::Byte 2.04 Encode::Config 2.05 Encode::Encoding 2.07 Encode::Locale 1.03 Errno 1.2003 Exporter 5.70 Exporter::Heavy 5.70 Fcntl 1.11 File::Basename 2.85 File::Glob 1.23 File::GlobMapper 1.000 File::Slurp 9999.19 File::Spec 3.48 File::Spec::Unix 3.48 File::Spec::Win32 3.48 HTML::Entities 3.69 HTML::Form 6.03 HTML::Parser 3.71 HTML::PullParser 3.57 HTML::Tagset 3.20 HTML::TokeParser 3.69 HTTP::Config 6.00 HTTP::Cookies 6.01 HTTP::Cookies::Netscape 6.00 HTTP::Date 6.02 HTTP::Headers 6.05 HTTP::Headers::Util 6.03 HTTP::Message 6.06 HTTP::Request 6.00 HTTP::Request::Common 6.04 HTTP::Response 6.04 HTTP::Status 6.03 IO 1.31 IO::Compress::Adapter::Deflate 2.068 IO::Compress::Base 2.068 IO::Compress::Base::Common 2.068 IO::Compress::Gzip 2.068 IO::Compress::Gzip::Constants 2.068 IO::Compress::RawDeflate 2.068 IO::Compress::Zlib::Constants 2.068 IO::Compress::Zlib::Extra 2.068 IO::File 1.16 IO::Handle 1.35 IO::Seekable 1.1 IO::Socket 1.37 IO::Socket::INET 1.35 IO::Socket::IP 0.35 IO::Socket::SSL 2.010 IO::Socket::SSL::PublicSuffix none IO::Socket::UNIX 1.26 IO::Uncompress::Adapter::Bunzip2 2.068 IO::Uncompress::Adapter::Inflate 2.068 IO::Uncompress::Base 2.068 IO::Uncompress::Bunzip2 2.068 IO::Uncompress::Gunzip 2.068 IO::Uncompress::Inflate 2.068 IO::Uncompress::RawInflate 2.068 List::Util 1.41 LWP 6.08 LWP::MemberMixin none LWP::Protocol 6.06 LWP::Protocol::http none LWP::Protocol::https 6.06 LWP::UserAgent 6.06 Mozilla::CA 20141217 Net::HTTP 6.07 Net::HTTP::Methods 6.07 Net::HTTPS 6.04 Net::SSLeay 1.66 POSIX 1.38_03 Scalar::Util 1.41 SelectSaver 1.02 Socket 2.016 Socket6 0.25 Storable 2.51 Symbol 1.07 Tie::Hash 1.05 Time::Local 1.2300 URI 1.65 URI::Escape 3.31 URI::http none URI::https none URI::_generic none URI::_idna none URI::_punycode 1.65 URI::_query none URI::_server none Win32::API 0.79 Win32::API::Struct 0.65 Win32::API::Type 0.69 Win32::Console 0.10 WWW::Mechanize 1.73
I did not paste the Dumper output because its huge and not properly copied to browser because of the binary contents.
Q: Why IO::Socket::SSL is not downloading full data? What more should I need to do in Script 2.
Update: Added Module versions
Update1: I have tested the Script2 in Linux Fedora 21, x64, Perl 5.18, It's is working fine :). So this looks like only problem in Windows + ActiveState Perl :(
Thanks & Regards,
Bakkiaraj M
My Perl Gtk2 technology demo project - http://code.google.com/p/saaral-soft-search-spider/ , contributions are welcome.