Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Perl https parser

by onlyakhila (Initiate)
on Jul 16, 2007 at 16:06 UTC ( [id://626871]=perlquestion: print w/replies, xml ) Need Help??

onlyakhila has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to parse a https file using perl. When i run the program below I get the error: Error: 500 Connect failed: connect: Unknown error; Also how can i change the $ua->agent('Mozilla/5.0'); to take the microsoft xp browser?
#!/usr/local/bin/perl $ENV{"HTTP_PROXY"} = "http://http-proxy:xx"; $ENV{"HTTP_PROXY_USER"} = "xxxx"; $ENV{"HTTP_PROXY_PASS"} = "xxxx"; use LWP::UserAgent; use XML::RSS; $ua = LWP::UserAgent->new; $req = HTTP::Request->new(GET=> 'https://www.paypal.com/'); $ua->env_proxy(); $ua->agent('Mozilla/5.0'); $res = $ua->request($req); if ($res->is_success) { #print ($res->content); printf "fetched %d bytes\n", length($res->content); } else { print "Error: " . $res->code . " " . $res->message; }
Thank you Akhila

Replies are listed 'Best First'.
Re: Perl https parser
by derby (Abbot) on Jul 16, 2007 at 16:47 UTC

    Works fine for me. But

    • I don't need to go through a proxy
    • you really don't need XML::RSS for this snippet
    • What the heck does to take the microsoft xp browser mean?
    • Where's the parsing problem? I just see a retrieval problem.

    More than likely, it's a proxy configuration issue.

    -derby
Re: Perl https parser
by almut (Canon) on Jul 16, 2007 at 18:55 UTC

    IIRC, there are some known issues with proxying HTTPS connections...   Actually, google turned up this (see the yellow box on the right, and follow the links referenced from there). You might want to try the suggested workaround, which essentially is to set HTTPS_PROXY, and let the underlying SSL library (Crypt::SSLeay) do the proxying in place of LWP.  Good luck!

Re: Perl https parser
by EvanK (Chaplain) on Jul 16, 2007 at 17:48 UTC
    As derby suggested above, it looks like a proxy configuration issue. As for the "microsoft xp browser", I assume you're asking how to set the agent to look like its coming from Internet Explorer? If so, you need to determine what user agent string you want to send (it varies between windows installations), and then simply supply it to the agent method. You can basically provide it any string:
    $ua->agent('My own custom user agent'); $ua->agent('Anything you want!');

    __________
    Systems development is like banging your head against a wall...
    It's usually very painful, but if you're persistent, you'll get through it.

      Thank you for your replies. I am trying to get the content from a web page (which has XML feeds in RSS 1.0, 2.0 and atom) and pass it through a RSS Parser to extract data which is why i am using XML::RSS. Firstly i was not able to retrive the RSS feed from the https webpage but i am able to do so for http sites. If i dont set a proxy i am not able to get to http sites either. Secondly I tried to pass the rss parser to a http site (http://ww.cnn.com/) and got an error:
      no element found at line 1, column 0, byte -1 at I:/Perl5.8.8.817/lib/XML/Parser.pm line 187
      The code for XML::RSS is:
      #!/usr/local/bin/perl $ENV{"HTTP_PROXY"} = "http://http-proxy:xx"; $ENV{"HTTP_PROXY_USER"} = "xxxx"; $ENV{"HTTP_PROXY_PASS"} = "xxxx"; use LWP::UserAgent; use XML::RSS; $ua = LWP::UserAgent->new; $req = HTTP::Request->new(GET => 'http://www.cnn.com/'); $ua->env_proxy(); $ua->agent('Mozilla/5.0'); $res = $ua->request($req); if ($res->is_success) { #print ($res->content); printf "fetched %d bytes\n", length($res->content); } else { print "Error: " . $res->code . " " . $res->message; } my $rss = new XML::RSS; $rss->parse($content); #print "rss is $rss\n";
      I have read that XML::RSS supports all forms of RSS feed, is that so?
      I would appreciate any help i can get on building an RSS parser in perl for https sites.
      Thank you, Akhila
Re: Perl https parser
by grashoper (Monk) on Jul 18, 2007 at 22:10 UTC
    you could use the microsoft fiddler tool to find out what useragent string ie returns then substitute that for mozilla I think it will work.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://626871]
Approved by Old_Gray_Bear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (2)
As of 2024-04-19 00:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found