Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Re^3: running an example script with WWW::Mechanize* module

by Aldebaran (Curate)
on Apr 30, 2020 at 04:32 UTC ( [id://11116259]=note: print w/replies, xml ) Need Help??


in reply to Re^2: running an example script with WWW::Mechanize* module
in thread running an example script with WWW::Mechanize* module

marto, thx for your reply...your script works right out of the gate. I believe that this is the second time I've followed Mojo:: based scripts you've posted, finding them useful in both circumstances. Also thanks for the tip to have a gander on cpan. It was interesting for me to look at the source, and clearly, if I wanted to pursue IMDB further, that would be the route.

Let me tell you why I would like to shift away from IMDB. It's that when I save the dom file to disc, it's 2.2 megs, which is nowhere near what I can lay eyes on and understand. Machines get it with the help of javascript, but I am only intermediate at best in my understanding of any of the matters I am writing about now.

I spent some time looking at Mojo:: beginning with:

#!/usr/bin/env perl use Mojolicious::Lite; get '/' => sub { my $c = shift; $c->render(text => 'Hello World!'); }; app->start; __END__

I'm not sure what it all means, but it seems to work and indicate that I have the capabilities I might expect.

$ ./1.mojo.pl daemon [2020-04-25 19:20:12.17317] [2970] [info] Listening at "http://*:3000" Server available at http://127.0.0.1:3000 ^C$

I had a bit of aha moment when comparing the logs for WMC:

2020/04/29 14:29:32 DEBUG Connecting to ws://127.0.0.1:37749/devtools/browser/58619c4b-5292-4d3a-a0f7-6b69c01c73dc

I don't know how to understand code short of working it and seeing the outcome. So I fiddle around with the examples I can work and then try to re-train the script on a different target, a smaller one that I might be able to understand and that furthers my goals with what I want to do for web automation.

If you would prefer some sort of web interface to the results wrap the above around Mojolicious::Lite

I'm hoping that I can get an event clicked on this page maybe with the Mojo:: family of tools. It's the site I've always gone to for ephemeral data and is further set for Portland, OR. I'm looking to get the radio button for julian day pressed and the value for jd populated by:

my $julian_day = 2458960;

, updated, and then I want to extract all the values from the table, but with a particular emphasis on getting whether the Sun is up or not at that precise time.

Corion and I have been trying to crack this with WMC, and we're not quite there. Here's how this button looks when I ask google chrome's inspector about it:

<c>full XPath: /html/body/form/center/table/tbody/tr[1]/td/table/tbody +/tr[3]/td[1]/input selector: body > form > center > table > tbody > tr:nth-child(1) > td +> table > tbody > tr:nth-child(3) > td:nth-child(1) > a XPath: /html/body/form/center/table/tbody/tr[1]/td/table/tbody/tr +[3]/td[1]/a JSPath: document.querySelector("body > form > center > table > tbody > + tr:nth-child(1) > td > table > tbody > tr:nth-child(3) > td:nth-chil +d(1) > a")

Another representation. I think this is what the DOM looks like after data dumped:

[ "tag", "input", { name => "date", onclick => 0, ty +pe => "radio", value => 2 }, 'fix', ], ["text", " ", 'fix'], [ "tag", "a", { href => "/yoursky/help/controls. +html#Julian" }, 'fix', ["text", "Julian day:", 'fix'], ], ["text", "\n", 'fix'], ], ["text", "\n", 'fix'], [ "tag", "td", {}, 'fix', ["text", "\n", 'fix'], [ "tag", "input", { name => "jd", onchange => "document.request.da +te[2].checked=true;", size => 20, type => "text", value => 2458963.36684, },

Put simply, can Mojolicious do this?

Replies are listed 'Best First'.
Re^4: running an example script with WWW::Mechanize* module
by marto (Cardinal) on Apr 30, 2020 at 10:31 UTC

    "Let me tell you why I would like to shift away from IMDB. It's that when I save the dom file to disc, it's 2.2 megs, which is nowhere near what I can lay eyes on and understand. Machines get it with the help of javascript, but I am only intermediate at best in my understanding of any of the matters I am writing about now."

    One of the nice things about Mojo::DOM is the support for CSS Selectors (see the Mojo docs section Learning Web Technologies). You don;t have to figure these out for yourself, you can use your browsers 'developer tools' GUI to click on things and copy their CSS selector/path. Searching for a tutorial for whatever browser you use should produce many videos/tutorials demoing this sort of thing. The selectors aren't always optimal, just looking at the HTML source can often point to much shorter selectors in many cases. Mojo::UserAgent makes it fairly simple to send data to web interfaces, and the return object contains the resulting DOM (->res->dom above) which you can then use to display/capture whatever data you like. Give it a shot and let me know if you have any problems.

      One of the nice things about Mojo::DOM...

      I hadn't been looking there but found at the bottom a simple way to get the DOM into lexical perl that guys like me can understand. I don't get any buttons pushed here, but I'm so pleased with this script that I'm gonna post it. It represents my best achievement yet in getting the DOM information in a format I can read and not blowing me out on STDOUT using Data::Dump.

      $ ./3.mojo_fermi.pl >3.txt Wide character in print at /usr/local/share/perl/5.26.1/Log/Log4perl/A +ppender/File.pm line 313. Wide character in print at /usr/local/share/perl/5.26.1/Log/Log4perl/A +ppender/Screen.pm line 41. $ cat 3.mojo_fermi.pl #!/usr/bin/perl use strict; use warnings; use Mojo::URL; use Mojo::Util qw(dumper); use Mojo::UserAgent; use Data::Dump; use Log::Log4perl; use 5.016; use Mojo::DOM; my $log_conf3 = "/home/hogan/Documents/hogan/logs/conf_files/3.conf"; my $log_conf4 = "/home/hogan/Documents/hogan/logs/conf_files/4.conf"; #Log::Log4perl::init($log_conf3); #debug Log::Log4perl::init($log_conf4); #info my $logger = Log::Log4perl->get_logger(); $logger->info("$0"); my $site = 'https://www.fourmilab.ch/cgi-bin/Yoursky?z=1&lat=45.5183&ns=North&lon +=122.676&ew=West'; # pretend to be a browser my $uaname = 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like G +ecko) Chrome/40.0.2214.93 Safari/537.36'; my $ua = Mojo::UserAgent->new; $ua->max_redirects(5)->connect_timeout(20)->request_timeout(20); $ua->transactor->name($uaname); # find search results my $dom = $ua->get($site)->res->dom; # dd $dom; #overwhelms STDOUT say "==========="; my @nodes = @$dom; # c-style for is good for array output with index for ( my $i = 0 ; $i < @nodes ; $i++ ) { $logger->info("i is $i =============="); $logger->info("$nodes[$i]"); } sleep 2; #good hygiene __END__ $

      I would excerpt my beautiful, straight, demarcated logs, but they're covered in symbols that won't render well here.

      Give it a shot and let me know if you have any problems.

      Thx, marto, I'll keep after it....

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11116259]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (5)
As of 2024-03-28 16:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found