Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

login to a secure web site via JavaScript

by wmfs (Acolyte)
on May 15, 2019 at 10:44 UTC ( [id://11100011]=perlquestion: print w/replies, xml ) Need Help??

wmfs has asked for the wisdom of the Perl Monks concerning the following question:

Is it possible to login to a secure web site via JavaScript with Perl?

As a subscriber to The New Yorker, I have access to their archive of all back issues. In order to access them I click a Login button, which evokes a JaveScript (javascript:RVViewers[0].subscription.doShowLoginPanel ?) which display a new window to accept my username/passsword.

I had hoped that by logging in via Safari on my Macbook Pro, I could 'piggy-back' that session in a Perl script to download all pages of given issue:

# Logout 'on click' button # href="javascript:RVViewers[0].subscription.DoLogout(false)" # Login 'on click' button # href="javascript:RVViewers[0].subscription.doShowLoginPanel()" use LWP::UserAgent; for($page = 1; $page < 200; $page++) { $number = sprintf("%07d", $page); $url = "https://archives.newyorker.com/rvimageserver/Conde%20Nast/N +ew%20Yorker/1948_02_07/page$number.jpg"; $ua = LWP::UserAgent->new; $req = HTTP::Request->new(GET => $url); # In Safari address box - javascript:prompt('your%20agent%20string%20 +is',navigator.userAgent) $ua->agent('Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWe +bKit/605.1.15 (KHTML, like Gecko) Version/12.1 Safari/605.1.15'); $ua->credentials('archives.newyorker.com:80','web_server_usage_repo +rts','user' => 'password'); $res = $ua->request($req); if ($res->is_success) { $number = sprintf("%03d", $page); open IMAGES, ">///Users/wmfs/New Yorker (1948-02-07)/Page $num +ber.jpg" or die; $result = $res->content; print IMAGES $result; print $number . " okay\n"; } else { print $url . "\n"; print $number . "\n"; print $res->status_line . "\n"; die; } }

Unfortunately, after the first two pages (which appear to be available even without a subscription), I face the dreaded "403 Forbidden" message.

Any help would be very welcome.

Thank you, Bill Seabrook

Replies are listed 'Best First'.
Re: login to a secure web site via JavaScript
by daxim (Curate) on May 15, 2019 at 11:16 UTC
    There are two fundamental problems.

    1. Use of credentials method is wrong - this is for HTTP authentication, but archives.newyorker.com authenticates via Web form.

    2. Your idea of what constitutes a session is different from the Web site. The User-Agent header clearly is not enough; most sites would carry a session in cookies. You would need to load Safari cookies into HTTP::Cookies::Safari and then feed that into LWP::UserAgent.

    If that doesn't work (which is likely, since LWP has not kept up at all with Web APIs in current use), try WWW::Scripter or WWW::Mechanize::Chrome instead.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11100011]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (3)
As of 2024-03-28 18:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found