Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re: Problem with LWP::UserAgent

by holli (Abbot)
on Jun 22, 2019 at 22:58 UTC ( [id://11101741]=note: print w/replies, xml ) Need Help??


in reply to Problem with LWP::UserAgent

The problem occures only on some webs
Would you be so kind of telling us which webs these are? So we can look and see if the webs are webbing correctly?


holli

You can lead your users to water, but alas, you cannot drown them.

Replies are listed 'Best First'.
Re^2: Problem with LWP::UserAgent
by Paradigma (Novice) on Jun 23, 2019 at 11:47 UTC

    For example https://www.7digital.com/

    By further investigation I discover that some webs send pages compressed by gzip and at least all of those make Perl crash. Don't know if I can somehow enable the support by any additional header attribute, what I tried doesn't work so far - HTTP::Headers->header('Accept-Encoding' => 'gzip')

    There may be also problem with expired SSL certificate not handling by LWP::UserAgent->request()

    In the meantime I'm fetching this web externally by cURL, but I'm not too comfortable with this as the content seems not parsed well. In any case I would prefer retrieving the web pages internally via Perl's module

    Alternately I would accept a suggestion to different more robust Perl's framework for fetching pages

      The code below works just fine for me. Try running it. It could be you are missing a dependency or something. What is the exact error message you get?
      use strict; use LWP::UserAgent; use HTTP::Headers; my $ua = LWP::UserAgent->new( 'agent' => 'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537 +.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36' ); my $hdr = HTTP::Headers->new( 'Content-Type' => 'text/plain', 'Content-Length' => 0, ); my $url = "https://www.7digital.com/"; my $req = HTTP::Request->new(GET => $url, $hdr); my $res = $ua->request($req); if ($res->is_success) { print $res->decoded_content; } else { die $res->status_line; }


      holli

      You can lead your users to water, but alas, you cannot drown them.

        Weird, $ua->request() by me crashes. The error reported by Perl is none (the crash is uncontrolled). The system exception code c0000005 stands for memory access violation (attempt to write or read to/from invalid address). What's your Perl version and can you share your ssleay32.dll and libeay32.dll?

        Event: APPCRASH Application: perl.exe App version: 5.26.3.2603 Error module: perl526.dll Error module version: 0.0.0.0 Exception code: c0000005 Exception offset: 000000000013ada7 OS version: 6.3.9600.2.0.0.256.4
Re^2: Problem with LWP::UserAgent
by Paradigma (Novice) on Jun 23, 2019 at 12:23 UTC
    my $ua = LWP::UserAgent->new( 'agent' => 'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537 +.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36' ); my $res = $ua->get($url, 'Content-Length' => 0, 'Accept-Encoding' => ' +gzip'); if ($res->is_success) { my $tree = HTML::TreeBuilder::XPath->new_from_content(Compress::Zlib +::memGunzip($res->content())); ... ... }

    ^^ This doesn't work either

      The code holli posted works for me just fine. Adding the gzip encoding to the headers also works fine but it works without it as well. The code you posted last works if replace Compress::Zlib::memGunzip($res->content()) with $res->decoded_content

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11101741]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (8)
As of 2024-04-23 11:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found