Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re^2: WWW:Mechanize bug?

by fraizerangus (Sexton)
on Oct 13, 2011 at 19:45 UTC ( [id://931369]=note: print w/replies, xml ) Need Help??


in reply to Re: WWW:Mechanize bug?
in thread WWW:Mechanize bug?

Monks

Thanks so much for the help! I did get it working however it only seems to fetch the first 7 and then the error message appears:

Internal Server Error at newp line 14

Using the following code:

#!/usr/bin/perl use strict; use WWW::Mechanize; use Storable; my $mech_cgi = WWW::Mechanize->new; $mech_cgi->get( 'http://www.molmovdb.org/cgi-bin/browse.cgi' ); my @cgi_links = $mech_cgi->find_all_links( url_regex => qr/motion.cgi/ + ); for(my $i = 0; $i < @cgi_links; $i++) { print "following link: ", $cgi_links[$i]->url, "\n"; $mech_cgi->follow_link( url => $cgi_links[$i]->url ) or die "Error following link ", $cgi_links[$i]->url; $mech_cgi->back; }

is this a fault with their server or my script?

many thanks and best wishes

Dan

Replies are listed 'Best First'.
Re^3: WWW:Mechanize bug?
by Anonymous Monk on Oct 13, 2011 at 23:11 UTC

    is this a fault with their server or my script?

    Can't say, that error message isn't very informative

    Try

    #!/usr/bin/perl -- use strict; use warnings; use WWW::Mechanize; my $mech_cgi = WWW::Mechanize->new ( autocheck => 1 ); $mech_cgi->show_progress(1); $mech_cgi->get( 'http://www.molmovdb.org/cgi-bin/browse.cgi' ); my @Motion = $mech_cgi->find_all_links( url_regex => qr/motion.cgi/ ); @Motion = map { $_->url_abs() } @Motion; for my $link ( @Motion ){ eval { $mech_cgi->get( $link ); 1; } or warn $@, "\n", $mech_cgi->res->as_string, "\n", '#'x33, "\n\n +"; $mech_cgi->back; } __END__
    And you'll get something more informative
    ** GET http://www.molmovdb.org/cgi-bin/browse.cgi ==> 202 OK ... ** GET http://..../4040404 ==> 404 Not Found Error GETing http://..../4040404: Not Found at somefile.pl line 12 HTTP/1.1 404 Not Found Connection: close Date: Thu, 13 Oct 2011 23:01:51 GMT ... Content-Length: 3942 Content-Type: text/html Client-Date: Thu, 13 Oct 2011 23:05:18 GMT ... Title: blah blah blah <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> ....

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://931369]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (4)
As of 2024-04-16 17:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found