Re: stuck with WWW::Mechanize drop down list

The dropdown selector uses javascript to reload the page. It's dorky:

<select id="_size" name="size" onchange="var s=sURL + '&size=' + this.value; document.location.href=s"><option value="all">All</option><option value="50">50</option><option value="100">100</option></select>

We can simulate this by adding an "&size=all" to the url. We'll do this by setting an extra field entry:

$browser->field( 'size', 'all' );
[download]

Example:

#!env perl
use strict;
use warnings;
use autodie qw/ open close /;
use 5.012;
use WWW::Mechanize;

# create WWW::Mechanize object
# autocheck 1 checks each request to ensure it was successful
my $browser = WWW::Mechanize->new( autocheck => [1] );

# retrieve page
$browser->get('http://www.ncbi.nlm.nih.gov/Traces/wgs/');

#select form to fill based on mech-dump output
$browser->form_number(1);

# fill field 'term' with name of species
$browser->field( 'term', 'Escherichia' );
$browser->field( 'size', 'all' );


# click apply button
$browser->submit('Apply');

my $url = $browser->uri;
print "url: $url\n";

# launch browser to test url
#system( 'firefox', $url );
print $browser->content();
[download]

Comment on Re: stuck with WWW::Mechanize drop down list Select or Download Code

Replies are listed 'Best First'.
Re^2: stuck with WWW::Mechanize drop down list by spazm (Monk) on Jun 02, 2012 at 01:02 UTC
Now that you have the full list, you'd like to follow the link for the "Download as TAB delimited list". In your browser, following the link will lead to a saved file. In the mech, this will be just more content. If you want to be clever, you can get the filename from the LWP's HTTP::Response and use it as a filename to dump the file. `$browser->follow_link( text_regex => qr/Download as TAB/i ); print $browser->content(); # prints TAB delimited file to STDOUT` [download] `$browser->follow_link( text_regex => qr/Download as TAB/i ); if ( my $filename = $browser->res->filename ) { die "file already exists [$filename]" if -e $filename; print STDERR "Saving downloaded file to [$filename]\n"; open my $fh, ">", $filename; print $fh $browser->content; close $fh; }` [download] #!env perl use strict; use warnings; use autodie qw/ open close /; use 5.012; use WWW::Mechanize; # create WWW::Mechanize object # autocheck 1 checks each request to ensure it was successful my $browser = WWW::Mechanize->new( autocheck => [1] ); # retrieve page $browser->get('http://www.ncbi.nlm.nih.gov/Traces/wgs/'); #select form to fill based on mech-dump output $browser->form_number(1); # fill field 'term' with name of species $browser->field( 'term', 'Escherichia' ); $browser->field( 'size', 'all' ); # click apply button $browser->submit('Apply'); my $url = $browser->uri; print "url: $url\n"; $browser->follow_link( text_regex => qr/Download as TAB/i ); #print $browser->content(); # prints TAB delimited file to STDOUT if ( my $filename = $browser->res->filename ) { die "file already exists [$filename]" if -e $filename; print STDERR "Saving downloaded file to [$filename]\n"; open my $fh, ">", $filename; print $fh $browser->content; close $fh; } [download]	[reply] [d/l] [select]
Re^3: stuck with WWW::Mechanize drop down list by abualiga (Scribe) on Jun 02, 2012 at 03:58 UTC
Spazm, thanks much, especially for the explanations! Question. If mech-dump doesn't output content of drop down lists, do I always need to look at the page source and, if so, then add the selection as a 'field' entry?	[reply]
Re^4: stuck with WWW::Mechanize drop down list by spazm (Monk) on Jun 02, 2012 at 06:09 UTC
I was just about to suggest mech-dump, good that you are already using it! Mechanize will only return form elements that are within <form></form> elements. The "All" dropdown is not within a set of form tags, it directly triggers javascript to reload the page. In cases like this you just have to figure out what the script is doing and duplicate. Possibly just by inspecting the request URL submitted by the browser. This is an area where scraping pages becomes tedious and tricky.	[reply]
Re^5: stuck with WWW::Mechanize drop down list by abualiga (Scribe) on Jun 02, 2012 at 13:22 UTC


"be consistent"
	PerlMonks