Re: Parsing a large html with perl

Welcome to the Monastery, zesys!

The top of that page says:

The following is dynamic list of all of the deployments that have data. It is being pulled from the deployments web service using the URL https://data.oceannetworks.ca/api/deployments?method=get&token=[YOUR_TOKEN_HERE]

Why don't you just use that API?

Anyway, if you need to parse HTML, then don't use regular expressions. Here's an example with Mojo::DOM:

use warnings;
use strict;
use Mojo::UserAgent;
use Mojo::DOM;

my $ua = Mojo::UserAgent->new( max_redirects=>3 );
my $dom = $ua->get(
    'https://wiki.oceannetworks.ca/display/O2A/Available+Deployments'
    )->result->dom;
$dom->find('.confluenceTable tr')->each(sub {
    my $tr = shift;
    my ($locationCode, $deviceCode, $dateFrom, $dateTo) = map {
        $tr->find(".confluenceTd:nth-of-type($_)")
            ->map('all_text')->join } 1..4;
    print "locationCode=$locationCode, deviceCode=$deviceCode, ",
        "dateFrom=$dateFrom, dateTo=$dateTo\n";
});
[download]

Comment on Re: Parsing a large html with perl Download Code

Replies are listed 'Best First'.
Re^2: Parsing a large html with perl by zesys (Novice) on Jun 04, 2020 at 04:42 UTC
Thanks so much @haukex. I have added two lines of code to yours (had two questions), and problem solved! Regarding the API, I use the service using client libraries written for python, almost everyday. I just wanted to do things differently this time by using Perl, for which the organisation does not seem to have a client library. Thank you all for your prompt answers and suggestions!!	[reply]
Re^3: Parsing a large html with perl by marto (Cardinal) on Jun 04, 2020 at 07:41 UTC
You don't need them to provide a client library in perl, writing your own is reasonably straightforward. The advantage of using their API is that generally speaking they are less suceptable to change than a webpage. Super Search for mojo api will find results to get you started.	[reply]
Re^2: Parsing a large html with perl by perlfan (Vicar) on Jun 03, 2020 at 04:17 UTC
OP, please do use the URL at https://wiki.oceannetworks.ca/display/O2A/API+Reference that haukex pointed out. it's a HTTP::Tiny call away! (hopefully an https URL is available) it's JSON! you'll learn a lot and be glad you did Note: If you do it right, you could get a Perl client listed in there. Also, see if it'll accept the query string via POST body, be sure to set your content-type header in the request to be `application/x-www-form-urlencoded`. Reason is, sending your special token via GET request is gonna get it logged everywhere and it's not protected by `https` .. and sometimes end points will accept it just the same as a POST. If it's just `http` then sending it via POST if it's accepted will at least eliminate your URL from getting logged everywhere with that token in it. If you insist on parsing the HTML and it really is just a large simple table, take a look at HTML::TableExtract.	[reply] [d/l] [select]
Re^3: Parsing a large html with perl by marto (Cardinal) on Jun 03, 2020 at 07:49 UTC
Usually makes more sense to reply to OP if that is who you are addressing. Your advice assumes they have API access, which may not be the case. The Mojo solution provided can deal just as easily with a JSON response as the HTML.	[reply]
Re^3: Parsing a large html with perl by zesys (Novice) on Jun 04, 2020 at 05:16 UTC
Thanks @perlfan. I will try your first suggestion. I admit, as a non-developer, I often find it a daunting task making sense of a JSON response.	[reply]
Re^4: Parsing a large html with perl [JSON Tips] by kcott (Archbishop) on Jun 06, 2020 at 09:04 UTC
G'day zesys, Welcome to the Monastery. "... I often find it a daunting task making sense of a JSON response." You don't say what aspects of this you find daunting. Here's a few tips. JSON is often presented as a single string many hundreds or thousands of characters long. I typically find this impossible to read at a glance; no doubt, you do too. The solution is to format that string into a more humanly readable structure. I use "JSON Formatter and Validator" for this; if you don't like that one, there are many others available, so just search for something that better suits you. Now that you have a readable structure, just think of each '`:`' as a '`=>`' and you have a Perl hashref. That's a slight oversimplification but, in nearly all cases, it will hold true. `# JSON: { "string" : "value", "array" : [ 1, 2, 3 ], "hash" : { "key1" : "val1", "key2" : "val2" } } # Perl: { "string" => "value", "array" => [ 1, 2, 3 ], "hash" => { "key1" => "val1", "key2" => "val2" } }` [download] The JSON syntax is actually very simple. It's described, clearly and succinctly, in "Introducing JSON". If you're not completely familiar with hashrefs, take a look at the Hashes section of "perlintro: Perl variable types". That section — indeed, the entirety of the perlintro page — is peppered with with links to more detailed descriptions, additional information, and more advanced, related topics: don't be put off by the idea that this page is just an introduction for complete novices. There's also a few gotchas which may not be immediately obvious; in some cases, they're highly unintuitive. Here's a couple that have tripped me up in the past: Valid JavaScript is not necessarily valid JSON. Strings in JSON must be delimited by double-quotes, so `{ "answer": 42 }` is valid in both. These, however, are valid in JavaScript but not in JSON: `{ 'answer': 42 }` and `{ answer: 42 }`. In Perl, the final element in a list may be optionally followed by a comma; in JSON, that final comma is not allowed. So, `[ 1, 2, 3 ]` is valid in both; however, `[ 1, 2, 3, ]` is valid Perl but invalid JSON. — Ken	[reply] [d/l] [select]


The stupid question is the question not asked
	PerlMonks