Re: Running JavaScript from within Perl (or just use the API)
by hippo (Bishop) on Sep 13, 2019 at 08:17 UTC
|
Can you offer guidance
Did you know that (or even consider whether) WordPress has an API? Not only that but there is already a whole range of modules on CPAN which use it. Perhaps the ability to retrieve the follower count is available via that API and will save you all this scaping and javascripting and whatnot.
| [reply] |
|
I tried following the A Beginners’s Guide to the WordPress REST API tutorial
It didn't work for my own (free) WordPress account, but when I used the "the-art-of-autism.com" (a premium account on which I have admin privileges) in place of "yourdomain.com" I was able to follow the tutorial successfully.
However, none of the Routes or Endpoints seem to give me what I want, which is the number of followers for an arbitrary WordPress account on which I don't have admin privileges. I'm encouraged by the REST API Handbook Reference page stating "The REST API provides public data accessible to any client anonymously, as well as private data only available after authentication."
I can't find any way to determine the number of followers, or what public data is accessible anonymously. Can you help with either of those? Thanks.
| [reply] |
|
https://developer.wordpress.com/docs/api/1.1/get/sites/$site/stats/fol
+lowers/
... but you need to be authenticated:
curl "https://public-api.wordpress.com/rest/v1.1/sites/the-art-of-auti
+sm.com/stats/followers"
{"error":"unauthorized","message":"user cannot view stats"}
So, you will either have to get permission by the respective sites or you will have to continue scraping the websites. | [reply] [d/l] [select] |
|
(Updated and clarified) The following endpoint:
https://public-api.wordpress.com/rest/v1/read/feed/?url=the-art-of-aut
+ism.com
contains a "feed" url:
https://public-api.wordpress.com/rest/v1/read/feed/34259929
that I want to read.
The following code (based on this JSON Tutorial) gives an error "Use of uninitialized value $feedurl in print".
use strict;
use warnings;
use Mojo::UserAgent;
my $url =
'https://public-api.wordpress.com/rest/v1/read/feed/?url=the-art-of-au
+tism.com';
my $ua = Mojo::UserAgent->new;
my $feedurl = $ua->get( $url )->result->json->{'feeds.meta.links.feed'
+};
print $feedurl;
Pleae tell me what I'm doing wrong. Thanks. | [reply] [d/l] [select] |
|
| [reply] |
|
|
|
|
|
|
use strict;
use warnings;
use Mojo::UserAgent;
my $url =
'https://public-api.wordpress.com/rest/v1/read/feed/34259929';
my $ua = Mojo::UserAgent->new;
my $subscribers = $ua->get($url)->result->json->{subscribers_count};
print "Number of subscribers: $subscribers\n";
my $feedurl = $ua->get( $url )->result->json->{'meta.links.self'};
print $feedurl;
Pleae tell me what I'm doing wrong. Thanks. | [reply] [d/l] |
|
|
|
Re: Running JavaScript from within Perl
by haukex (Archbishop) on Sep 13, 2019 at 05:34 UTC
|
Naively it seems to me that since my browser can interpret a web page using JavaScript without any a priori information, Perl should be able to as well. Is this possible? If not, why not?
JavaScript has access to a ton of things implemented in the browser, like the HTML document's DOM, various JavaScript APIs, and so on. To run JS code correctly, Perl would need to provide all of those, essentially re-implementing a browser, which is of course incredibly complex. See also the "JavaScript" section in WWW::Mechanize::FAQ.
(For the general case of running JS from Perl, there was a talk in Riga: Embedding JavaScript in Perl.)
| [reply] |
Re: Running JavaScript from within Perl
by Marshall (Canon) on Sep 13, 2019 at 02:43 UTC
|
Perl can't run Java script itself. One solution is to use: WWW::Mechanize::Chrome. Previously it was possible to automate Firefox and I played with that, but unfortunately Firefox took out the interface that allowed the automation to happen. I haven't used the Chrome version yet. Anyway the idea is to have Perl control Chrome which will run the Javascript code. Then you read what Chrome figured out. | [reply] |
Re: Running JavaScript from within Perl
by harangzsolt33 (Chaplain) on Sep 14, 2019 at 05:10 UTC
|
The JavaScript program on a web page can dynamically modify the page, so what you see has very little or no resemblance to the HTML source code! So, if you can scrape your web page using JavaScript, you get a peek at what's actually on the screen.
Here is an example. When you click on the "View HTML" button on this page, you'll see one thing. Then you click on the "Change" button which modifies the code, and then you click on View HTML again, and you'll see the code with some slight changes. The source code hasn't changed, but what's in the memory has changed, and when you get to harvest that, you get the real picture.
Here is the JavaScript program that harvests the HTML code:
var DATA = document.all[0].innerHTML;
If the block of HTML code you're trying to harvest is marked with an ID tag like this:
<DIV ID="Part3">
...
OR
<P ID="MyText">
...
OR
<TABLE ID="Table2">
...
then you don't need to harvest the entire HTML page. All you have to do is harvest whatever is tagged. So, you would just do this:
var DATA = document.getElementById("Part3").innerHTML;
Instead of using "innerHTML," you could also use "innerText" which gives you only the plain text without all the HTML tags and whatnot:
var DATA = document.getElementById("Part3").innerText;
Once you have the code in the DATA variable, then you can run a regex or something to get the actual number you're looking for.. JavaScript regex works like perl's.
<HTML>
<BODY>
<NOSCRIPT>
<DIV STYLE="BACKGROUND-COLOR:RED; COLOR:WHITE; FONT-FAMILY:ARIAL;"><CE
+NTER>This page requires JavaScript.</CENTER>
</DIV>
</NOSCRIPT>
<H3 ID="HEADING">Welcome</H3>
<DIV ID="CONTENT">
<P>This is a very simple HTML page.
<P><INPUT TYPE=BUTTON VALUE=" View HTML " onClick="ViewHTML();">
<INPUT TYPE=BUTTON VALUE=" Change " onClick="DoSomething();">
</DIV>
<SCRIPT>
function ViewHTML() {
var DATA = document.all[0].innerHTML;
alert("This is the page content as seen from JavaScript:\n\n" + DATA
+);
}
function DoSomething() {
document.getElementById("HEADING").innerHTML = "<FONT COLOR=BLUE>DEA
+R VISITOR</FONT>";
var MyCONTENT = document.getElementById("CONTENT");
MyCONTENT.innerHTML = "<FONT COLOR=RED>" + MyCONTENT.innerHTML;
}
</SCRIPT>
I tested the above code, and it works in Firefox 52, KMeleon 7.5, QupZilla 1.8.6, Safari 5.1.7, Google Chrome 75, Internet Explorer 6, Opera 7.5, and Vivaldi 1.0. I have also tested it with an iPhone 7, Nokia Lumia 930 Windows Phone and an old Android 6 tablet. I haven't used any "ultra modern technology" that will break your phones. Everything in this example script is pretty standard.
Once you get the number you want to send back to your perl script, you could send it back by loading a picture:
<HTML>
<BODY>
<IMG NAME=PIX6 BORDER=0 WIDTH=1 HEIGHT=1 STYLE="POSITION:ABSOLUTE; TOP
+:0; LEFT:0;">
<SCRIPT>
NUMBER = 90;
document.images.PIX6.src = "http://www.yourwebsite.com/yourscript.pl?"
+ + NUMBER;
</SCRIPT>
Here you're sending the number 90 back to your perl script.
You could also signal to your perl script when somebody loads your web page with JavaScript turned off by putting a picture within the NOSCRIPT tags. Whatever you put between the NOSCRIPT tags will only appear when JavaScript is disabled on the page:
<NOSCRIPT>
<IMG SRC="http://www.yourwebsite.com/yourscript.pl?N" BORDER=0 WIDTH=1
+ HEIGHT=1 STYLE="POSITION:ABSOLUTE; TOP:0; LEFT:0;">
</NOSCRIPT>
| [reply] [d/l] [select] |
|
| [reply] |
|
curl 'https://public-api.wordpress.com/rest/v1.1/sites/en.blog.wordpre
+ss.com/posts/?number=2'
which I couldn't figure out how to make work.
By contrast, the example provided in A Beginners’s Guide to the WordPress REST API is
curl -X GET -i http://the-art-of-autism.com/wp-json/wp/v2/posts
which does work.
Can you help me reconcile the two (which will hopefully help me interpret the rest of the WordPress REST API documentation)?
Also, do REST API Resources only work on premium WordPress sites? I was able to execute GET /sites/$site/posts/ on the-art-of-autism.com (a premium site) but not on anautismobserver.wordpress.com (a free site). Do you know the reason for this?
I really appreciate your help. You've already saved me a great deal of time and effort (and greatly increased my success chances). Thank you ever so much. | [reply] [d/l] [select] |
|
|
Using GET /read/feed/$feed_url_or_id I can generate a web page containing the number of followers shown as "subscribers_count".
How do I read this page into a perl script? I tried HTML::TreeBuilder and got the error message:
https://public-api.wordpress.com/rest/v1/read/feed/http%3A%2F%2Fthe-art-of-autism.com%2Ffeed returned application/json not HTML
Should I use WWW::Mechanize::Chrome, JSON, JavaScript, or something else? How do I provide them input from a URL?
| [reply] |
|
|
|
|
|
use strict;
use warnings;
use Mojo::UserAgent;
my $filename = 'urls_Mojo.txt';
open(my $fh, '<:encoding(UTF-8)', $filename)
or die "Could not open file '$filename' $!";
my $y = 0; # input row count
while (my $row = <$fh>) {
$y++;
print $y;
print " $row";
my $url = $row;
# create a Mojo:UserAgent
my $ua = Mojo::UserAgent->new;
# use $ua to get the url and assign the value of 'subscriber_count' in
+ the json
# to avariable, $subscribers
my $subscribers = $ua->get( $url )->result->json->{subscribers_count};
# print the variable to screen
print "Number of subscribers: $subscribers\n";
}
it worked when the file 'urls_Mojo.txt' contained
https://public-api.wordpress.com/rest/v1/read/feed/http%3A%2F%2Fthe-ar
+t-of-autism.com%2Ffeed
but gave a "Can't use an undefined value as a HASH reference" error when I added a second line to 'urls_Mojo.txt' as follows:
https://public-api.wordpress.com/rest/v1/read/feed/http%3A%2F%2Fthe-ar
+t-of-autism.com%2Ffeed
https://public-api.wordpress.com/rest/v1/read/feed/http%3A%2F%2Fanauti
+smobserver.wordpress.com%2Ffeed
Can you help me figure out how to apply this script to a list of url's in a file? Thanks.
| [reply] [d/l] [select] |
|
|
|
curl --help
I get a list of options. Does that mean I have curl installed? (I don't remember installing it.)
If not, please tell me how to install it from the zip file. Thanks. | [reply] [d/l] |
|
|
| [reply] |
|
|
|
|
|
|
I don't mind responses that go beyond the narrow bounds of what I asked. It's like learning a new language: sometimes it's best to immerse myself in the new culture and see what I can absorb.
I like learning new software through following tutorials (though this runs the risk of learning outdated information). I'm starting working through The Ultimate Guide To The WordPress REST API (written in September 2015 by Josh Pollock). He recommends using Vagrant, VirtualBox, and Git, which I've downloaded and installed on my computer
Is The Ultimate Guide To The WordPress REST API a good resource (obtained from here)?
Do you know of any better (perhaps newer) tutorials for the WordPress REST API?
| [reply] |
|
"I don't mind responses that go beyond the narrow bounds of what I asked. It's like learning a new language: sometimes it's best to immerse myself in the new culture and see what I can absorb."
The method described in the post you're replying to wont help you achieve what you asked to do. If you're interested in learning about JavaScript and HTML/DOM manipulation there are better resources (from the Mojolicious docs):
"All web development starts with HTML, CSS and JavaScript, to learn the basics we recommend the Mozilla Developer Network. And if you want to know more about how browsers and web servers actually communicate, there's also a very nice introduction to HTTP."
"I've downloaded and installed Vagrant, VirtualBox, and Git for Windows on my computer"
What part of problem does this solve?
"Is this a good resource? (https://wpengine.com/resources/the-ultimate-guide-to-the-wordpress-rest-api/)"
I've no idea, you need to register to download an ebook.
"Do you know of any better (perhaps newer) tutorials for the WordPress REST API?"
What is missing from the official WordPress documentation?
Update: Re^3: Running JavaScript from within Perl (or just use the API)/https://developer.wordpress.com/docs/api/1.1/get/sites/%24site/stats/followers/.
| [reply] |
|
|
|
Re: Running JavaScript from within Perl
by Anonymous Monk on Sep 13, 2019 at 02:40 UTC
|
| [reply] |
Re: Running JavaScript from within Perl
by FreeBeerReekingMonk (Deacon) on Sep 17, 2019 at 20:20 UTC
|
Maybe also using PhantomJS works for you? (it has JavaScript interpretation). Note that it is almost abandoned.
| [reply] |