Re: downloading a file on a page with javascript

Replies are listed 'Best First'.

Re^2: downloading a file on a page with javascript
by Aldebaran (Curate) on Apr 06, 2020 at 22:33 UTC

there are at least two ways to approach this.

I was particularly pleased to see this response from bliako, whose pm posts are at a level where I can, about half the time, stretch my game to replicate, understand, and incorporate into "my game," whatever that is. I was thinking there should be several ways that perl could do either natively, or by wrapping C, or with modules. Getting the url right needs to be a part of any solution.

The first is to use WWW::Mechanize::Chrome

I had trouble installing WWW::Mechanize::Chrome, but it was all of the variety where I needed only to make better web searches for prereq's.

The first "problem" was getting WWW::Mechanize::Chrome to install on ubuntu. I lacked 2 things at the beginning: a chrome executable, and headers for png.h .

For ubuntu, a good command line install for chrome is here. Since being able to save a screenshot as a png is necessary, I also needed:

sudo apt-get install libpng-dev

This is as far as I got along this prong. Output, then source:

$ ./1.mai.pl 

                        enable1.txt
                      
Yay
[download]

#!/usr/bin/perl

use strict;
use Log::Log4perl qw(:easy);
use WWW::Mechanize::Chrome;
use Data::Dump;
use 5.016;
 
my $mech = WWW::Mechanize::Chrome->new();
my $url = 'https://code.google.com/archive/p/dotnetperls-controls/down
+loads';
 
$mech->get($url);
 
 
print $_->text . "\n"
    for $mech->find_all_links( text_regex => qr/enable/i );

$mech->follow_link( xpath => '//a[text() = "enable1.txt"]' );

my @words;
# check the outcome
if ($mech->success) {
   #print $res->decoded_content;
   #@words = mech->decoded_content;
   print "Yay\n";
}
else {
   print "Error: " . $mech->status . "\n";
}

if (@words) {
print "@words\n";

}

sleep 1;
[download]

Aspects of downloads are yet to be implemented according to the 35:06 mark here: corion's presentation from 2017

Q1) How do I brook the gap from $mech->follow_link to populating @words ?

The second is to open the site with your browser, open the developer tools (firefox, but also other will have similar functionality). Go to the network tab, select XHR and reload the page. You will see all the data fetched via ajax. And you will see where does that data come from, it comes from urls just like the one you tried to download. Copy that url as CURL (its on the right-click menu somewhere) and you can see exactly what the url is, what its parameters are. Now, note the url, its parameters and whether it is a POST or a GET and what request-headers it has. It's easy to translate those into LWP::UserAgent.

I did something close to this dozens of different ways. What ended up working for me was left-clicking on the link while the developer tools--including network tab--are on and then finding the copy to curl on the right click menu as one hovers over it in the tools. This yields:

curl 'https://www.googleapis.com/storage/v1/b/google-code-archive/o/v2
+%2Fcode.google.com%2Fdotnetperls-controls%2Fproject.json?alt=media&st
+ripTrailingSlashes=false' -H 'User-Agent: Mozilla/5.0 (X11; Ubuntu; L
+inux x86_64; rv:74.0) Gecko/20100101 Firefox/74.0' -H 'Accept: applic
+ation/json, text/plain, */*' -H 'Accept-Language: en-US,en;q=0.5' --c
+ompressed -H 'Origin: https://code.google.com' -H 'Connection: keep-a
+live' -H 'Referer: https://code.google.com/archive/p/dotnetperls-cont
+rols/downloads' -H 'Cache-Control: max-age=0' -H 'TE: Trailers'
[download]

Then I turned to Corion's curl2lwp converter. I'm super pleased by this:

$ ./2.curl.pl | tail -5
zymotic
zymurgies
zymurgy
zyzzyva
zyzzyvas
$ cat 2.curl.pl 
#!/usr/bin/perl

use strict;
use warnings;

use LWP::UserAgent;

my $ua = LWP::UserAgent->new( 'send_te' => '0' );
my $r  = HTTP::Request->new(
    'GET' =>
'https://storage.googleapis.com/google-code-archive-downloads/v2/code.
+google.com/dotnetperls-controls/enable1.txt',
    [
        'Connection' => 'keep-alive',
        'Accept' =>
'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;
+q=0.8',
        'Accept-Encoding' => 'gzip, x-gzip, deflate, x-bzip2, bzip2',
        'Accept-Language' => 'en-US,en;q=0.5',
        'Host'            => 'storage.googleapis.com:443',
        'Referer' =>
          'https://code.google.com/archive/p/dotnetperls-controls/down
+loads',
        'User-Agent' =>
'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:74.0) Gecko/20100101 Firef
+ox/74.0',
        'Upgrade-Insecure-Requests' => '1',
    ],

);
my $res = $ua->request( $r, );

### begin Aldebaran-added source 
my @words;
# check the outcome
if ($res->is_success) {
   #print $res->decoded_content;
   @words = $res->decoded_content;
}
else {
   print "Error: " . $res->status_line . "\n";
}

if (@words) {
print "@words\n";

}

__END__
$
[download]

This represents a huge learning curve partially-ascended for me, including considering the Bigger picture with introduction to DOM.

I have one more question at this point, regarding the practice scripts at examples, all of which use Log::Log4perl. If I have:

$ cat /etc/2.log.conf
######################################################################
+#########
#                              Log::Log4perl Conf                     
+        #
######################################################################
+#########
log4perl.rootLogger              = DEBUG, LOG1, SCREEN
log4perl.appender.SCREEN         = Log::Log4perl::Appender::Screen
log4perl.appender.SCREEN.stderr  = 0
log4perl.appender.SCREEN.layout  = Log::Log4perl::Layout::PatternLayou
+t
log4perl.appender.SCREEN.layout.ConversionPattern = %m %n
log4perl.appender.LOG1           = Log::Log4perl::Appender::File
log4perl.appender.LOG1.filename  = /home/hogan/Documents/hogan/logs/2.
+log4perl.txt
log4perl.appender.LOG1.mode      = append
log4perl.appender.LOG1.layout    = Log::Log4perl::Layout::PatternLayou
+t
log4perl.appender.LOG1.layout.ConversionPattern = %d %p %m %n
$
[download]

, and this successfully logs events and errors:

#!/usr/bin/perl
use Log::Log4perl;
# Initialize Logger
my $log_conf = "/etc/2.log.conf";
Log::Log4perl::init($log_conf);
my $logger = Log::Log4perl->get_logger();
$logger->info("===== before system call");
system('ls -l qwerty');
if( $? > 0 ) {
    $logger->error("there was an error: $?");
}
$logger->info("===== after system call");
[download]

Q2) How do I log using this scheme? For example, do I go from

else {
   print "Error: " . $mech->status . "\n";
}
[download]

to:

else {
   $logger->error("there was an error: $mech->status" . "\n") ;
}
[download]

Again, thanks all for comments, which seem to be the "service work" that most of us can do in these unusual times of "social distancing." Stay healthy!

2020-04-07 Athanasius fixed formatting of over-long code line.

[reply]
[d/l]
[select]


laziness, impatience, and hubris
	PerlMonks