Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re: Web Scraping with Find / Replace (Mojo::DOM)

by beech (Parson)
on Dec 01, 2016 at 22:19 UTC ( [id://1177089]=note: print w/replies, xml ) Need Help??


in reply to Web Scraping with Find / Replace

Hi,

#TODO Replace all of the links with fully qualified url's
#TODO Save the master_content to a file with the same file name

Here you go

use Path::Tiny qw/ path /; path( $newFileName )->spew_utf8( qq{<base href="$insert_str">}, $conte +nt );

You might need to html-escape $insert_str ... could use Mojo for that part

$ perl -Mojo -e " $dom = x(q{<base>}); $dom->at(q{base})->attr(qw{href + http://example.com/?&}); print $dom " <base href="http://example.com/?&amp;">

See Path::Tiny, https://developer.mozilla.org/en-US/docs/Web/HTML/Element/base, https://metacpan.org/pod/ojo#x

Replies are listed 'Best First'.
Re^2: Web Scraping with Find / Replace (Mojo::DOM)
by sjfranzen (Initiate) on Dec 02, 2016 at 16:25 UTC
    Thank you for your response. Unfortunately I do not understand your approach or how to include in my script.

      Well,

      If you add a base tag to the html content, then there is no need to rewrite relative links into absolute links, its a shortcut provided by html

      The spew part of the code does that with a helper module for creating a file

      Second part shows creating/modifying a base tag with Mojo which will htmlescape the url

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1177089]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (3)
As of 2024-04-19 15:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found