Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re: Strip specific html sequence

by haukex (Archbishop)
on Dec 10, 2017 at 16:57 UTC ( [id://1205260]=note: print w/replies, xml ) Need Help??


in reply to Strip specific html sequence

Please see Parsing HTML/XML with Regular Expressions for why it is indeed not a good idea to do this without a proper parser, especially look at the "spoiler" for lots of cases of perfectly valid HTML that will not be fun to parse with a regex. Here's an example with Mojo::DOM:

use warnings; use strict; use Mojo::DOM; my $html = <<'ENDHTML'; <html><head><title>Title</title></head> <body> <div><div> </div></div><div><div class="blue"></div></div> </body> </html> ENDHTML my $dom = Mojo::DOM->new($html); $dom->find('div > div.blue') ->each(sub{ $_->parent->remove }); print $dom; __END__ <html><head><title>Title</title></head> <body> <div><div> </div></div> </body> </html>

I had a quick look at "Git for Windows", and it happens to include HTML::Parser. In the above thread, tangent showed an example with that module here, and because it's a fairly old but good module you will find lots of examples with it online as well. That Git distribution also appears to contain cpan as well, so you could try installing Mojo::DOM.

Replies are listed 'Best First'.
Re^2: Strip specific html sequence
by koober (Novice) on Dec 10, 2017 at 17:43 UTC

    That's a lot of good news to take in; I could have looked for that first, eh?. Many thanks. I get the hint and will abandon this path. I'm also late realizing that I could follow another path. The HTML is Perl generated anyway, this bad bit is generated by two separate lines, hence my supposed shortcut to clean them up afterwards. I can also investigate a look-ahead to prevent these bits being written.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1205260]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (3)
As of 2024-04-19 01:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found