Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Problem with LWP::UserAgent

by Paradigma (Novice)
on Jun 22, 2019 at 20:48 UTC ( #11101736=perlquestion: print w/replies, xml ) Need Help??

Paradigma has asked for the wisdom of the Perl Monks concerning the following question:

Hi there, I'm experiencing Perl crashes when using LWP::UserAgent->request() in my code. I suppose it's initialized properly with valid HTTP::Request object. The problem occures only on some webs, so I guess the objects might be not sufficiently initialized, or missing some required properties defined. Can I have some input what I should to try to solve the crashes?

Just a bit of my specs: Perl v5.26.3 on WIndows 8.1 64bit.

Building the request goes usually:

my $ua = LWP::UserAgent->new( 'agent' => 'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537 +.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36' ); $hdr->header( 'Content-Type' => 'text/plain', 'Content-Length' => 0, ); my $req = HTTP::Request->new(GET => $url, $hdr); my $res = $ua->request($req); # <=================== C0000005

Replies are listed 'Best First'.
Re: Problem with LWP::UserAgent
by holli (Abbot) on Jun 22, 2019 at 22:58 UTC
    The problem occures only on some webs
    Would you be so kind of telling us which webs these are? So we can look and see if the webs are webbing correctly?


    holli

    You can lead your users to water, but alas, you cannot drown them.

      For example https://www.7digital.com/

      By further investigation I discover that some webs send pages compressed by gzip and at least all of those make Perl crash. Don't know if I can somehow enable the support by any additional header attribute, what I tried doesn't work so far - HTTP::Headers->header('Accept-Encoding' => 'gzip')

      There may be also problem with expired SSL certificate not handling by LWP::UserAgent->request()

      In the meantime I'm fetching this web externally by cURL, but I'm not too comfortable with this as the content seems not parsed well. In any case I would prefer retrieving the web pages internally via Perl's module

      Alternately I would accept a suggestion to different more robust Perl's framework for fetching pages

        The code below works just fine for me. Try running it. It could be you are missing a dependency or something. What is the exact error message you get?
        use strict; use LWP::UserAgent; use HTTP::Headers; my $ua = LWP::UserAgent->new( 'agent' => 'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537 +.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36' ); my $hdr = HTTP::Headers->new( 'Content-Type' => 'text/plain', 'Content-Length' => 0, ); my $url = "https://www.7digital.com/"; my $req = HTTP::Request->new(GET => $url, $hdr); my $res = $ua->request($req); if ($res->is_success) { print $res->decoded_content; } else { die $res->status_line; }


        holli

        You can lead your users to water, but alas, you cannot drown them.
      my $ua = LWP::UserAgent->new( 'agent' => 'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537 +.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36' ); my $res = $ua->get($url, 'Content-Length' => 0, 'Accept-Encoding' => ' +gzip'); if ($res->is_success) { my $tree = HTML::TreeBuilder::XPath->new_from_content(Compress::Zlib +::memGunzip($res->content())); ... ... }

      ^^ This doesn't work either

        The code holli posted works for me just fine. Adding the gzip encoding to the headers also works fine but it works without it as well. The code you posted last works if replace Compress::Zlib::memGunzip($res->content()) with $res->decoded_content

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://11101736]
Approved by Paladin
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (4)
As of 2020-11-30 12:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?