Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re^7: Looking for a module that strips an HTML tag and its associated 'TEXT'

by bliako (Prior)
on Jul 29, 2020 at 14:29 UTC ( #11119981=note: print w/replies, xml ) Need Help??


in reply to Re^6: Looking for a module that strips an HTML tag and its associated 'TEXT'
in thread Looking for a module that strips an HTML tag and its associated 'TEXT'

This looks like a very simple DOM manipulation: delete nodes from the DOM. I can do that in Firefox's developer tools. I am sure any DOM manipulator can do that. Specifically the Mojo::DOM suggested by marto should also be able to do it - but I have not used it before. In short: parse your html and convert it to a DOM, which is a Tree of html-tag nodes. Locate the node by xpath or other exotic selector. Zap the node and/or its children. Work at as high level as you can with this one because the spec will continually change and change and it will come to bite you.

Edit: I am not sure if the process of HTML -> DOM -> manipulate -> HTML will retain exactly the white spaces from the original HTML between tags as it seems you want to keep them given the test cases you provided.

Add: a regex "is simpler" but it isn't.

bw, bliako

  • Comment on Re^7: Looking for a module that strips an HTML tag and its associated 'TEXT'

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://11119981]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (7)
As of 2020-12-01 15:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    How often do you use taint mode?





    Results (11 votes). Check out past polls.

    Notices?