Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re^5: Trouble with some of IDDB Public Methods

by marto (Cardinal)
on Jan 01, 2021 at 09:54 UTC ( [id://11126076]=note: print w/replies, xml ) Need Help??


in reply to Re^4: Trouble with some of IDDB Public Methods
in thread Trouble with some of IMDB Public Methods

"I certainly hope that we don't optimize away the comments and break up the logic as opposed to having just a train of arrows that online sources may have, with words whose provenance is unknown, like top in this example:"

# JSON POST (application/json) with TLS certificate authentication my $tx = $ua->cert('tls.crt')->key('tls.key')->post('https://example.c +om' => json => {top => 'secret'}); [download]

"or json, there's nothing that makes keywords stand out, and where does one go to determine their provenance? How exactly are you going to disambiguate 'json'?"

As with the cert attribute, just look at the post documentation. It's just encoding a perl value to JSON, and posting it to an example site with TLS cert auth. Consider the longhand example of just the JSON part:

#!/usr/bin/perl use strict; use warnings; use Mojo::JSON qw(encode_json); use feature 'say'; my $bytes = encode_json{ top => 'secret' }; say $bytes;

Following the appropriate links in the Mojo docs takes you to the relevant places.

"I couldn't get titles with multiple words to work at all. The search replaces spaces with plusses in the url, but interpolation with a lexical variable is just beneath mojo, even if it worked, which it doesn't. What I want is a script that shows me what's at this site from a mojo point of view, and this does so naively:"

A lazy way (since it's early on New Years day) would be to take my example, prompt for a film title and replace spaces with the plus sign. If you want to go down the route of automating forms, as mentioned before, make life easy on yourself and use the browser 'developer tools' to find the data you need for the form fields you care about. This is more effective then grepping in the dark from dumped results.

#!/usr/bin/perl use strict; use warnings; use feature 'say'; use Mojo::URL; use Mojo::Util qw(trim); use Mojo::UserAgent; my $imdburl = 'https://www.imdb.com/search/title?title='; # prompt for title, replace spaces with plus signs say 'Enter name of film to search for: '; my $film = <STDIN>; chomp $film; $film =~ s/ /+/g; $imdburl .= $film; # pretend to be a browser my $uaname = 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 ( +KHTML, like Gecko) Chrome/40.0.2214.93 Safari/537.36'; my $ua = Mojo::UserAgent->new; $ua->max_redirects(5)->connect_timeout(20)->request_timeout(20); $ua->transactor->name( $uaname ); # find search results my $dom = $ua->get( $imdburl )->res->dom; #my $dom = $ua->post( $imdburl => form => {title => $film} )->res->dom +; # assume first match my $filmurl = $dom->find('a[href^=/title]')->first->attr('href'); # extract film id my $filmid = Mojo::URL->new( $filmurl )->path->parts->[-1]; # get details of film $dom = $ua->get( "https://www.imdb.com/title/$filmid/" )->res->dom; # print film details say 'Search Results'; say trim( $dom->at('div.title_wrapper > h1')->text ) . ' (' . trim( $d +om->at('#titleYear > a')->text ) .')'; # print actor/character names foreach my $cast ( $dom->find('table.cast_list > tr:not(:first-child)' +)->each ){ say trim ($cast->at('td:nth-of-type(2) > a')->text ) . ' as ' . trim + ( $cast->at('td.character')->all_text ); }

Outputs:

This example is only differs from my original by a few verbose lines, and again is sub optimal, and intended just to get you started. Obviously this is aimed at Films, and if you search for a series rather than a film the resulting page has differences that you'd need to cater for. If your intention is to take this further I'd strongly recommend using the browser developer tools, don't get hung up on how Mojo can dump the page data and all it's elements, this is mostly unimportant if you just want to automate an existing interface. Adding code to cater for different types of results (film, TV show), obvious error checking, perhaps better prompting of results rather than assuming the first one is what the user means, e.g. a search for 'Batman' returns "The Batman (2022)" rather than "Batman (1966)".

Update: added spoiler tag explanation.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11126076]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others contemplating the Monastery: (7)
As of 2024-04-16 18:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found