Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Error when calling $mechanize->content;

by Jacob_Kold (Initiate)
on Apr 18, 2018 at 13:44 UTC ( [id://1213119]=perlquestion: print w/replies, xml ) Need Help??

Jacob_Kold has asked for the wisdom of the Perl Monks concerning the following question:

How come this webpage gives me an internal server error when I try to scrape it with Mechanize
my $mech = WWW::Mechanize->new(); $mech->get( https://www.aabenraa.dk/politik-og-dialog/politiske-udvalg +/arbejdsmarkedsudvalget/medlemmer ); my $html_udv = $mech->content;
And some html containing the line: "Object reference not set to an instance of an object" When I try to scrape the same webpage with UserAgent?
my $ua = LWP::UserAgent->new( ssl_opts => { verify_hostname => 0 },); my $res = $ua->get('https://www.aabenraa.dk/politik-og-dialog/politisk +e-udvalg/arbejdsmarkedsudvalget/medlemmer'); my $html = $res->content;
What can I do to get the HTML of the webpage and not get any errors?

Replies are listed 'Best First'.
Re: Error when calling $mechanize->content;
by marto (Cardinal) on Apr 18, 2018 at 14:15 UTC

    Your first example is missing single quotes for the working URL, your second example has a typo in the URL which returns 'This page was not found' (Google translated).

    https://www.aabenraa.dk/politik-og-dialog/politiske-udvalg/arbejdsmark +edsudvalget/medlemmer https://www.aabenraa.dk/politik-og-dialog/politisk-udvalg/arbejdsmarke +dsudvalget/medlemmer

    Even when setting a valid UA string, I get the ASP.net error 'NullReferenceException: Object reference not set to an instance of an object.]', and some debug info best kept away from prying eyes. There's certainly something making it difficult to screenscrape. Perhaps they don't want you to.

A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1213119]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (3)
As of 2024-04-25 05:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found