Re: Web Scraping with Find / Replace (Mojo::DOM)


Perl: the Markov chain saw
	PerlMonks

Re: Web Scraping with Find / Replace (Mojo::DOM)

by beech (Parson)

on Dec 01, 2016 at 22:19 UTC ( [id://1177089]=note: print w/replies, xml )

Need Help??

in reply to Web Scraping with Find / Replace

Hi,

#TODO Replace all of the links with fully qualified url's
#TODO Save the master_content to a file with the same file name

Here you go

use Path::Tiny qw/ path /;
path( $newFileName )->spew_utf8( qq{<base href="$insert_str">}, $conte
+nt  );
[download]

You might need to html-escape $insert_str ... could use Mojo for that part

$ perl -Mojo -e " $dom = x(q{<base>}); $dom->at(q{base})->attr(qw{href
+ http://example.com/?&}); print $dom "
<base href="http://example.com/?&amp;">
[download]

See Path::Tiny, https://developer.mozilla.org/en-US/docs/Web/HTML/Element/base, https://metacpan.org/pod/ojo#x

Comment on Re: Web Scraping with Find / Replace (Mojo::DOM) Select or Download Code

Replies are listed 'Best First'.
Re^2: Web Scraping with Find / Replace (Mojo::DOM) by sjfranzen (Initiate) on Dec 02, 2016 at 16:25 UTC
Thank you for your response. Unfortunately I do not understand your approach or how to include in my script.	[reply]
Re^3: Web Scraping with Find / Replace (Mojo::DOM) by beech (Parson) on Dec 04, 2016 at 20:55 UTC
Well, If you add a base tag to the html content, then there is no need to rewrite relative links into absolute links, its a shortcut provided by html The spew part of the code does that with a helper module for creating a file Second part shows creating/modifying a base tag with Mojo which will htmlescape the url	[reply]

In Section Seekers of Perl Wisdom

Domain Nodelet^?

www.com | www.net | www.org

Node Status^?

node history
Node Type: note [id://1177089]
help

Chatterbox^?

How do I use this? • Last hour • Other CB clients

Other Users^?

Others examining the Monastery: (3)

As of 2024-04-19 15:44 GMT

Sections^?

Information^?

Find Nodes^?

Leftovers^?

Today I Learned

Voting Booth^?

No recent polls found