WWW:Mechanize bug?

fraizerangus has asked for the wisdom of the Perl Monks concerning the following question:

I'm working with WWW::Mechanize, seems to be the right medicine for what I need but I've already hit a snag on the road; I'm only interested in following the 'motion.cgi' links and extracting these links as text documents however the regex I've used only finds the first 2 links? Anybody any ideas on whats going on?


#!/usr/bin/perl
use strict;
use WWW::Mechanize;
use Storable;

my $mech_cgi = WWW::Mechanize->new;

$mech_cgi->get( 'http://www.molmovdb.org/cgi-bin/browse.cgi' );

my @cgi_links = $mech_cgi->find_all_links( url_regex => qr/motion.cgi?
+/ );

for(my $i = 0; $i < @cgi_links; $i++) { 
    print "following link: ", $cgi_links[$i]->url, "\n";
    $mech_cgi->follow_link( url => $cgi_links[$i]->url )
      or die "Error following link ", $cgi_links[$i]->url;

}
[download]

best wishes

Dan

Comment on WWW:Mechanize bug? Download Code

Replies are listed 'Best First'.
Re: WWW:Mechanize bug? by jethro (Monsignor) on Oct 12, 2011 at 23:10 UTC
One small bug is that the '?' in your regex is special, meaning 0 or 1 occurence of the previous character, in this case 'i'. You probably want '\?' instead. The same goes for '.'. But that can't be your problem because the regex now is more general than the correct regex would be. You could remove the 'url_regex' parameter to test whether you get the links if you don't have any restrictions at all. Then use a url_regex qr/mot/ and slowly add to the regex until your links are not found anymore.	[reply]
Re: WWW:Mechanize bug? by Anonymous Monk on Oct 13, 2011 at 03:27 UTC
the regex I've used only finds the first 2 links? Anybody any ideas on whats going on? You're confused about how a browser works you get first/url you get list of links from first/url first link takes you to second/url second/url has no more links, especially not the links from first/url, so you can't follow Either rewind the browser, or use get, not follow	[reply]
Re^2: WWW:Mechanize bug? by fraizerangus (Sexton) on Oct 13, 2011 at 19:45 UTC
Monks Thanks so much for the help! I did get it working however it only seems to fetch the first 7 and then the error message appears: Internal Server Error at newp line 14 Using the following code: `#!/usr/bin/perl use strict; use WWW::Mechanize; use Storable; my $mech_cgi = WWW::Mechanize->new; $mech_cgi->get( 'http://www.molmovdb.org/cgi-bin/browse.cgi' ); my @cgi_links = $mech_cgi->find_all_links( url_regex => qr/motion.cgi/ + ); for(my $i = 0; $i < @cgi_links; $i++) { print "following link: ", $cgi_links[$i]->url, "\n"; $mech_cgi->follow_link( url => $cgi_links[$i]->url ) or die "Error following link ", $cgi_links[$i]->url; $mech_cgi->back; }` [download] is this a fault with their server or my script? many thanks and best wishes Dan	[reply] [d/l]
Re^3: WWW:Mechanize bug? by Anonymous Monk on Oct 13, 2011 at 23:11 UTC
is this a fault with their server or my script? Can't say, that error message isn't very informative Try `#!/usr/bin/perl -- use strict; use warnings; use WWW::Mechanize; my $mech_cgi = WWW::Mechanize->new ( autocheck => 1 ); $mech_cgi->show_progress(1); $mech_cgi->get( 'http://www.molmovdb.org/cgi-bin/browse.cgi' ); my @Motion = $mech_cgi->find_all_links( url_regex => qr/motion.cgi/ ); @Motion = map { $_->url_abs() } @Motion; for my $link ( @Motion ){ eval { $mech_cgi->get( $link ); 1; } or warn $@, "\n", $mech_cgi->res->as_string, "\n", '#'x33, "\n\n +"; $mech_cgi->back; } __END__` [download] And you'll get something more informative ` GET http://www.molmovdb.org/cgi-bin/browse.cgi ==> 202 OK ... GET http://..../4040404 ==> 404 Not Found Error GETing http://..../4040404: Not Found at somefile.pl line 12 HTTP/1.1 404 Not Found Connection: close Date: Thu, 13 Oct 2011 23:01:51 GMT ... Content-Length: 3942 Content-Type: text/html Client-Date: Thu, 13 Oct 2011 23:05:18 GMT ... Title: blah blah blah <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> ....` [download]	[reply] [d/l] [select]


Come for the quick hacks, stay for the epiphanies.
	PerlMonks