Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re: running an example script with WWW::Mechanize* module

by Corion (Pope)
on Apr 13, 2020 at 06:52 UTC ( #11115422=note: print w/replies, xml ) Need Help??


in reply to running an example script with WWW::Mechanize* module

WWW::Mechanize::Chrome doesn't inherit from WWW::Mechanize, but it strives to provide the same API as WWW::Mechanize where possible/applicable.

In some situations, I've deviated from the WWW::Mechanize API unfortunately, instead of either using a different method name or expanding the API in a compatible way...

  • Comment on Re: running an example script with WWW::Mechanize* module

Replies are listed 'Best First'.
Re^2: running an example script with WWW::Mechanize* module
by Aldebaran (Deacon) on Apr 19, 2020 at 01:25 UTC
    WWW::Mechanize::Chrome doesn't inherit from WWW::Mechanize, but it strives to provide the same API as WWW::Mechanize where possible/applicable.

    Okay, and this because you don't have this in WMC:

    use base qw(WWW::Mechanize);

    right? I have to wonder if you considered other namespaces to put this, in particular when I found out that WWW::Mechanize inherits from LWP::UserAgent:

    $ pwd /usr/local/share/perl/5.26.1/WWW/Mechanize $ .. $ ls Mechanize Mechanize.pm $ cat Mechanize.pm | more package WWW::Mechanize; #ABSTRACT: Handy web browsing in a Perl object use strict; use warnings; our $VERSION = '1.96'; use Tie::RefHash; use HTTP::Request 1.30; use LWP::UserAgent 5.827; use HTML::Form 1.00; use HTML::TokeParser; use Scalar::Util qw(tainted); use base 'LWP::UserAgent';
    In some situations, I've deviated from the WWW::Mechanize API unfortunately, instead of either using a different method name or expanding the API in a compatible way...

    From all of this background reading of what happened over the last 20 years of the internet and perl, the idea that one would want to faithfully represent in every detail what worked in 2003 with what works in 2015 seems like folly. I'll try out the new stuff and see how I do with it.

    I hauled out one of my favorite WMG scripts only to find that it doesn't populate values correctly anymore, so I'm ready to start using newer tools. I have achieved such a minor amount of success. Between readmore tags I'll post the older script that I'm trying to modernize:

    I'm fairly confident that it behaved and produced accurate results. (There is a chance that it was a script that I was trying to extend and lost my way. I can't always tell them apart.) Now let's look at how far I've gotten with WMC:

    #! /usr/bin/perl use warnings; use strict; use WWW::Mechanize::Chrome; use HTML::TableExtract qw(tree); use open ':std', OUT => ':utf8'; use Prompt::Timeout; use constant TIMEOUT => 3; use constant MAXTRIES => 30; ## redesign for solar eclipse of aug 21, 2017 ### begin 2020 rewrite ### with WWW::Mechanize::Chrome ### and Log::Log4perl use Log::Log4perl; use Data::Dump; use 5.016; my $log_conf3 = "/home/hogan/Documents/hogan/logs/conf_files/3.conf"; my $log_conf4 = "/home/hogan/Documents/hogan/logs/conf_files/4.conf"; #Log::Log4perl::init($log_conf3); #debug Log::Log4perl::init($log_conf4); #info my $logger = Log::Log4perl->get_logger(); my $current_level = $logger->level(); $logger->info("script begins with $current_level"); my $a = 'b'; for my $i ( 1 .. 2 ) { say "i is $i"; $logger->info("i is $i"); my $site = 'http://www.fourmilab.ch/yoursky/cities.html'; my $mech = WWW::Mechanize::Chrome->new( headless => 1, ); $mech->get($site); $mech->follow_link( text_regex => qr/Portland OR/i ); say "We are at " . $mech->uri; if ( $mech->success() ) { open my $gh, '>', "$a.form-log.txt" or warn "Couldn't open logfile $a.form-log.txt $!"; $mech->dump_forms($gh); say $gh "========="; } my $guess = 2458960; #Earth day 2020 in julian days $mech->form_number($i); say "$i works" if $mech->success(); say $mech->current_form->{name}; # ?? say "current form has a name" if $mech->success(); ## syntax that used to work with WWW::Mechanize # $mech->set_fields(qw'date 2'); #$mech->set_fields(); $mech->field( date => '2' ); ## analogs to set set_fields in WM say "first field set succeeded" if $mech->success(); $mech->field( jd => $guess ); say "second field set succeeded" if $mech->success(); $mech->click_button( value => "Update" ); # this seems similar to + WM say "clickbutton succeeded" if $mech->success(); my $string = $mech->uri; $logger->info("We are at $string") if $mech->success(); ## get a screenshot of how far we made it my $page_png = $mech->content_as_png(); my $base = '/home/hogan/5.scripts/1.corion./template_stuff/aimag +es'; my $fn = $base . "/$a.png"; open my $fh, '>', $fn or die "Couldn't create '$fn': $!"; binmode $fh, ':raw'; print $fh $page_png; close $fh; print "exiting show_screen with letter $a\n"; my $n = 2; $logger->info("sleeping for $n seconds ===================="); $mech->sleep($n); $a++; }

    Terminal output:

    $ ./5.pluto.pl script begins with 20000 i is 1 i is 1 Connected to ws://127.0.0.1:43601/devtools/browser/0c9af664-f568-4f8a- +bd7f-496dd9593030 We are at http://www.fourmilab.ch/cgi-bin/Yoursky?z=1&lat=45.5183&ns=N +orth&lon=122.676&ew=West 1 works Use of uninitialized value in say at ./5.pluto.pl line 55. current form has a name <no text> at /usr/local/share/perl/5.26.1/WWW/Mechanize/Chrome.pm line + 3779. <no text> at /usr/local/share/perl/5.26.1/WWW/Mechanize/Chrome.pm line + 3779. <no text> at /usr/local/share/perl/5.26.1/WWW/Mechanize/Chrome.pm line + 3779. 3 elements found for input with name 'date' at ./5.pluto.pl line 63. $

    The log output shows that I've been jimmying with the loop values that I'm using to figure out which form I need. It turns out, it is not zero-based in this context. Zero bombs out.

    2020/04/18 17:18:49 INFO script begins with 20000 2020/04/18 17:18:49 INFO Connected to ws://127.0.0.1:35721/devtools/br +owser/37b006e1-5a92-4191-b50f-7475ff4d12d9 2020/04/18 17:24:03 INFO script begins with 20000 2020/04/18 17:24:03 INFO i is 0 2020/04/18 17:24:03 INFO Connected to ws://127.0.0.1:38435/devtools/br +owser/79feea55-b183-4da4-9111-b43ddb825bdd 2020/04/18 17:26:33 INFO script begins with 20000 2020/04/18 17:26:33 INFO i is 1 2020/04/18 17:26:33 INFO Connected to ws://127.0.0.1:43601/devtools/br +owser/0c9af664-f568-4f8a-bd7f-496dd9593030

    Whilst far short of a grand opus or masterpiece, this script does something that I couldn't manage with WM, namely, effective logging. Log::Log4Perl is required on WMC, which has the advantage of that functionality. The disadvantage is that you've gotta get it installed, which has been a (fixable) problem for some. It didn't want to install with my strawberry perl on windows 10, which I house on another partition. Two ways to solve the problem are given in getting Log::Log4perl to install on windows strawberry perl.

    So, where am I stuck? Well, this is hot off the press and represents several similar attempts. It's nice to be using WMC and log4perl to figure this out. You can't be reading the same things the machines do as it overwhelms STDOUT. My partial results are encouraging, and this seems very much like a problem of getting forms and fields set and selected with new methods calls. Here is the uri we're looking at. It's a fun site, and you can readily enter your own information.

    I do have data from the formdump:

    [FORM] request /cgi-bin/Yoursky [INPUT (submit)] <no name> [INPUT (radio)] date [INPUT (radio)] date [INPUT (text)] utc [INPUT (radio)] date [INPUT (text)] jd [INPUT (text)] lat [INPUT (radio)] ns [INPUT (radio)] ns [INPUT (text)] lon [INPUT (radio)] ew [INPUT (radio)] ew [INPUT (checkbox)] coords [INPUT (checkbox)] moonp [INPUT (checkbox)] deep [INPUT (text)] deepm [INPUT (checkbox)] consto [INPUT (checkbox)] constn [INPUT (checkbox)] consta [INPUT (checkbox)] consts [INPUT (checkbox)] constb [INPUT (text)] limag [INPUT (checkbox)] starn [INPUT (text)] starnm [INPUT (checkbox)] starb [INPUT (text)] starbm [INPUT (checkbox)] flip [INPUT (text)] imgsize [INPUT (text)] fontscale [SELECT (select-one)] scheme [INPUT (checkbox)] edump [TEXTAREA (textarea)] elements

    Fishing for tips. Thanks, Corion for your response and this considerable achievement:

    $ wc -l $(locate Chrome.pm) 5761 /home/hogan/Documents/repos/wmc/WWW-Mechanize-Chrome/lib/WWW/Me +chanize/Chrome.pm 5708 /usr/local/share/perl/5.26.1/WWW/Mechanize/Chrome.pm 11469 total $

      There is a misunderstanding of $mech->success - this method only reflects whether the last HTTP response from the server is considered an error or not. It does not reflect whether the last operation on $mech was successful or not. Error checking is usually done by die by WWW::Mechanize::Chrome.

      I haven't run your code, but the log output suggests that the form you're looking at has no name:

      say $mech->current_form->{name}; # ?? # Use of uninitialized value in say at ./5.pluto.pl line 55.

      The form is not great, because it really contains three fields with the same name date, so you will have to fetch the individual fields and explicitly set them:

      # largely untested my @date_fields = $mech->selector('.//*[@name="date"]', node => $self- +>current_form ); $mech->set_field( $date_fields[1] => $guess );

      In the next version, I'll actually implement the arrayref form of ->set_fields() for values of index larger than one :) But that means breaking my (incompatible) API to restore the WWW::Mechanize API so I'll have to look carefully there.

      $mech->set_fields( $name => [ 'foo', 2 ] );
        The form is not great, because it really contains three fields with the same name date, so you will have to fetch the individual fields and explicitly set them:

        Thanks, Corion, I think we're almost there. I've got this pared down as far as I can to make an SSCCE. I can't get perl to think I have a valid selector:

        $ ./6.1.pluto.pl say (...) interpreted as function at ./6.1.pluto.pl line 47. 2020/04/20 20:10:39 Connected to ws://127.0.0.1:40757/devtools/browser +/10f7d706-4dbd-4073-a438-916f6602fb4c found the one and only form Invalid rule, couldn't parse '//*[@name="date", node => $mech->current +_form ]' at /usr/local/share/perl/5.26.1/HTML/Selector/XPath.pm line +283. $

        Source, with the critical line tried several different ways:

        #! /usr/bin/perl use warnings; use strict; use WWW::Mechanize::Chrome; use Log::Log4perl qw(:easy); use Data::Dump; use 5.016; Log::Log4perl->easy_init($INFO); my $site = 'https://www.fourmilab.ch/cgi-bin/Yoursky?z=1&lat=45.5183&ns=North&lon +=122.676&ew=West'; my $mech = WWW::Mechanize::Chrome->new( headless => 1, ); $mech->get($site); my $guess = 2458960; #Earth day 2020 in julian days $mech->form_number(1); say "found the one and only form" if $mech->success(); # my best guess...aka...trial 1 # $mech->field( date => '2', jd => $guess ); # stderr: 3 elements found for input with name 'date' at ./6.1.pluto.p +l line 21. # your first guess...aka trial 2 # my @date_fields =$mech->selector( './/*[@name="date"]', node => $sel +f->current_form ); # stderr: Global symbol "$self" requires explicit package name # 3rd guess #my @date_fields =$mech->selector( './/*[@name="date"]', node => $mech +->current_form ); # stderr: Invalid rule, couldn't parse '//*[@name="date"]' at /usr/loc +al/share/perl/5.26.1/HTML/Selector/XPath.pm line 283. # 4th guess # my @date_fields =$mech->selector( './/*[@name="date", node => $self- +>current_form ]'); # stderr: Invalid rule, couldn't parse '//*[@name="date", node => $sel +f->current_form ]' # 5th guess my @date_fields =$mech->selector( './/*[@name="date", node => $mech->c +urrent_form ]'); # Invalid rule, couldn't parse '//*[@name="date", node => $mech->curre +nt_form ]' $mech->set_field( $date_fields[1] => $guess ); $mech->click_button( value => "Update" ); # this seems similar to W +M say "clickbutton succeeded" if $mech->success(); my $string = $mech->uri; say ("We are at $string") if $mech->success();

        That lays it out there as starkly as I can. VielenDank und Gruss aus Amiland.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://11115422]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (7)
As of 2020-11-24 12:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?