spunk has asked for the wisdom of the Perl Monks concerning the following question:
Hello Monks,
I've developed some WWW::Mechanize scripts that work without any problems, but now I am trying to route these scripts through a proxy (Privoxy v3.0.5 Beta running on my local Linux machine) and I'm finding that for all HTTPS requests, I always get 500 response codes. If I remove the Privoxy proxy from my Perl script, everything works. If I keep the proxy and just go to HTTP sites, everything works. If I configure my browser to go through the proxy and load an HTTPS page, everything works.
So to summarize so far...
HTTP request from WWW::Mechanize -> Privoxy => works!
HTTPS request from browser -> Privoxy => works!
HTTPS request from WWW::Mechanize -> direct connection to internet => works!
HTTPS request from WWW::Mechanize -> Privoxy => does NOT work!
Looking at Privoxy's detailed log file, the first sign of things going wrong appears to be that WWW::Mechanize passes a GET request to the proxy. The browsers do not do this, they use CONNECT I really don't know for sure if this is correct since CONNECT isn't really specified in the W3 HTTP 1.1 spec that I Googled.
My hypothesis is that Firefox has got it right and that WWW::Mechanize is not smart enough to use CONNECT instead of GET when requesting HTTPS pages throught a proxy.
My questions to the group are...
1) Does all of this sound right?
2) How would I force a CONNECT from either WWW::Mechanize or LWP in this cirumstance? Nothing is mentioned in any of the docs I've seen. Grepping the code didn't reveal anything to me either.
Here's my code....
Here is the Privoxy log looks like when I use my perl script...
Here is what the Privoxy log file looks like when a browser (Firefox in this case) requests Yahoo's login page through the proxy...
I am a real loss for what to do next, any help would greatly be appreciated. So many sites have enctypted login pages that this impact almost all of the sites that I want to automate.
I've developed some WWW::Mechanize scripts that work without any problems, but now I am trying to route these scripts through a proxy (Privoxy v3.0.5 Beta running on my local Linux machine) and I'm finding that for all HTTPS requests, I always get 500 response codes. If I remove the Privoxy proxy from my Perl script, everything works. If I keep the proxy and just go to HTTP sites, everything works. If I configure my browser to go through the proxy and load an HTTPS page, everything works.
So to summarize so far...
HTTP request from WWW::Mechanize -> Privoxy => works!
HTTPS request from browser -> Privoxy => works!
HTTPS request from WWW::Mechanize -> direct connection to internet => works!
HTTPS request from WWW::Mechanize -> Privoxy => does NOT work!
Looking at Privoxy's detailed log file, the first sign of things going wrong appears to be that WWW::Mechanize passes a GET request to the proxy. The browsers do not do this, they use CONNECT I really don't know for sure if this is correct since CONNECT isn't really specified in the W3 HTTP 1.1 spec that I Googled.
My hypothesis is that Firefox has got it right and that WWW::Mechanize is not smart enough to use CONNECT instead of GET when requesting HTTPS pages throught a proxy.
My questions to the group are...
1) Does all of this sound right?
2) How would I force a CONNECT from either WWW::Mechanize or LWP in this cirumstance? Nothing is mentioned in any of the docs I've seen. Grepping the code didn't reveal anything to me either.
Here's my code....
#!/usr/bin/perl -w use strict; use WWW::Mechanize; use HTTP::Cookies; use LWP; use LWP::DebugFile; require HTTP::Request; sub main { my $cookie_jar = HTTP::Cookies->new( file => 'cookies.dat', autosave => 1, hide_cookie2 => 1 ); my $bot = WWW::Mechanize->new; $bot->max_redirect(100); $bot->cookie_jar($cookie_jar); $bot->proxy(['http', 'https'], 'http://192.168.250.11:8118/'); $bot->agent('Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1. +8.0.3) Gecko/20060426 Firefox/1.5.0.3'); my $url = "https://login.yahoo.com"; #my $url = "https://us.etrade.com"; my $response = $bot->get($url); my $content = $bot->content; } &main
Here is the Privoxy log looks like when I use my perl script...
Here is the LWP debug information generated...Dec 09 22:53:30 Privoxy(b7f856c0) Info: Privoxy version 3.0.5 Dec 09 22:53:30 Privoxy(b7f856c0) Info: Program name: ./privoxy Dec 09 22:53:30 Privoxy(b7f856c0) Info: Listening on port 8118 on IP a +ddress 192.168.250.11 Dec 09 22:53:44 Privoxy(b7f84bb0) Header: New HTTP Request-Line: GET / + HTTP/1.0 Dec 09 22:53:44 Privoxy(b7f84bb0) Header: scan: GET / HTTP/1.0 Dec 09 22:53:44 Privoxy(b7f84bb0) Header: scan: Accept-Encoding: ident +ity Dec 09 22:53:44 Privoxy(b7f84bb0) Header: scan: Host: login.yahoo.com Dec 09 22:53:44 Privoxy(b7f84bb0) Header: scan: User-Agent: Mozilla/5. +0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.3) Gecko/20060426 Fire +fox/1.5.0.3 Dec 09 22:53:44 Privoxy(b7f84bb0) Header: addh-unique: Host: login.yah +oo.com Dec 09 22:53:44 Privoxy(b7f84bb0) Header: Adding: Connection: close Dec 09 22:53:44 Privoxy(b7f84bb0) Request: login.yahoo.com/ Dec 09 22:53:44 Privoxy(b7f84bb0) Writing: �Dec 09 22:53:45 Pri +voxy(b7f84bb0) Writing: GET / HTTP/1.0 Accept-Encoding: identity Host: login.yahoo.com User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.3 +) Gecko/20060426 Firefox/1.5.0.3 Connection: close Dec 09 22:53:47 Privoxy(b7f84bb0) Header: Adding: Connection: close Dec 09 22:53:47 Privoxy(b7f84bb0) Writing: Connection: close
# LWP::DebugFile logging to lwp_457baef8_5876.log # Time now: {1165733624} = Sat Dec 9 22:53:44 2006 LWP::UserAgent::new: () LWP::UserAgent::proxy: ARRAY(0x8ce0c98) http://192.168.250.11:8118/ LWP::UserAgent::proxy: http http://192.168.250.11:8118/ LWP::UserAgent::proxy: https http://192.168.250.11:8118/ LWP::UserAgent::request: () HTTP::Cookies::add_cookie_header: Checking login.yahoo.com for cookies HTTP::Cookies::add_cookie_header: Checking .yahoo.com for cookies HTTP::Cookies::add_cookie_header: Checking yahoo.com for cookies HTTP::Cookies::add_cookie_header: Checking .com for cookies LWP::UserAgent::send_request: GET https://login.yahoo.com LWP::UserAgent::_need_proxy: Proxied to http://192.168.250.11:8118/ LWP::Protocol::http10::request: () LWP::Protocol::http10::request: S>0 "GET https://login.yahoo.com HTTP/ +1.0\x0D\x0A" LWP::Protocol::http10::request: S>+ "Accept-Encoding: identity\x0D\x0A +" LWP::Protocol::http10::request: S>+ "Host: login.yahoo.com\x0D\x0A" LWP::Protocol::http10::request: S>+ "User-Agent: Mozilla/5.0 (Windows; + U; Windows NT 5.1; en-US; rv:1. 8.0.3) Gecko/20060426 Firefox/1.5.0.3\x0D\x0A\x0D\x0A" LWP::Protocol::http10::request: reading response # Time now: {1165733627} = Sat Dec 9 22:53:47 2006 LWP::Protocol::http10::request: S>0 "Connection: close\x0D\x0A\x0D\x0A +" LWP::Protocol::http10::request: HTTP/0.9 assume OK LWP::Protocol::collect: read 21 bytes LWP::UserAgent::request: Simple response: OK
Here is what the Privoxy log file looks like when a browser (Firefox in this case) requests Yahoo's login page through the proxy...
Dec 09 22:56:06 Privoxy(b7f84bb0) Header: scan: CONNECT login.yahoo.co +m:443 HTTP/1.1 Dec 09 22:56:06 Privoxy(b7f84bb0) Header: scan: User-Agent: Mozilla/5. +0 (X11; U; Linux i686; en-US; rv:1.8.0.8) Gecko/20061109 CentOS/1.5.0 +.8-0.1.el4.centos4 Firefox/1.5.0.8 pango-text Dec 09 22:56:06 Privoxy(b7f84bb0) Header: scan: Proxy-Connection: keep +-alive Dec 09 22:56:06 Privoxy(b7f84bb0) Header: scan: Host: login.yahoo.com Dec 09 22:56:06 Privoxy(b7f84bb0) Header: crumble crunched: Proxy-Conn +ection: keep-alive! Dec 09 22:56:06 Privoxy(b7f84bb0) Header: addh-unique: Host: login.yah +oo.com:443 Dec 09 22:56:06 Privoxy(b7f84bb0) Header: Adding: Connection: close Dec 09 22:56:06 Privoxy(b7f84bb0) Request: login.yahoo.com:443/ Dec 09 22:56:06 Privoxy(b7f84bb0) Writing: �Dec 09 22:56:09 Pri +voxy(b7f84bb0) Writing: HTTP/1.0 200 Connection established Proxy-Agent: Privoxy/3.0.5 (...encrypted traffic follws.)
I am a real loss for what to do next, any help would greatly be appreciated. So many sites have enctypted login pages that this impact almost all of the sites that I want to automate.
Back to
Seekers of Perl Wisdom