Re^6: Looking for a module that strips an HTML tag and its associated 'TEXT'

I want the code to pass these simple tests:

# eliminate_tags
is(eliminate_tags("url: <a href=\"http://example.com/\">http://example
+.com/</a>", 'a'),
   "url: ");
is(eliminate_tags("<div>\n  <p>hoge foo.</p>\n  <p>bar tarao.</p>\n</d
+iv>", 'p'),
   "<div>\n  \n  \n</div>");

# eliminate_links
is(eliminate_links("url: <a href=\"http://example.com/\">http://exampl
+e.com/</a>"),
   "url: ");
is(eliminate_links("<div>\n  <p>hoge foo.</p>\n  <p>bar tarao.</p>\n</
+div>"),
   "<div>\n  <p>hoge foo.</p>\n  <p>bar tarao.</p>\n</div>");
[download]

$PM = "Perl Monk's";
$MCF = "Most Clueless ~~Friar~~ ~~Abbot~~ ~~Bishop~~ ~~Pontiff~~ ~~Deacon~~ ~~Curate~~ ~~Priest~~ Vicar";
$nysus = $PM . ' ' . $MCF;
Click here if you love Perl Monks

Comment on Re^6: Looking for a module that strips an HTML tag and its associated 'TEXT' Download Code

Replies are listed 'Best First'.
Re^7: Looking for a module that strips an HTML tag and its associated 'TEXT' by bliako (Monsignor) on Jul 29, 2020 at 14:29 UTC
This looks like a very simple DOM manipulation: delete nodes from the DOM. I can do that in Firefox's developer tools. I am sure any DOM manipulator can do that. Specifically the Mojo::DOM suggested by marto should also be able to do it - but I have not used it before. In short: parse your html and convert it to a DOM, which is a Tree of html-tag nodes. Locate the node by xpath or other exotic selector. Zap the node and/or its children. Work at as high level as you can with this one because the spec will continually change and change and it will come to bite you. Edit: I am not sure if the process of HTML -> DOM -> manipulate -> HTML will retain exactly the white spaces from the original HTML between tags as it seems you want to keep them given the test cases you provided. Add: a regex "is simpler" but it isn't. bw, bliako	[reply]
Re^7: Looking for a module that strips an HTML tag and its associated 'TEXT' by marto (Cardinal) on Jul 29, 2020 at 14:29 UTC
And did you look at my earlier suggestion?	[reply]
Re^8: Looking for a module that strips an HTML tag and its associated 'TEXT' by nysus (Parson) on Jul 29, 2020 at 14:40 UTC
I looked at Mojo::Dom briefly. It's a general purpose tool. Was hoping for a module that let me knock this out in in like two lines. Mojo::Dom is my fallback plan if I can't find what I'm looking for. $PM = "Perl Monk's"; $MCF = "Most Clueless ~~Friar~~ ~~Abbot~~ ~~Bishop~~ ~~Pontiff~~ ~~Deacon~~ ~~Curate~~ ~~Priest~~ Vicar"; $nysus = $PM . ' ' . $MCF; Click here if you love Perl Monks	[reply]
Re^9: Looking for a module that strips an HTML tag and its associated 'TEXT' by marto (Cardinal) on Jul 29, 2020 at 14:57 UTC
"Was hoping for a module that let me knock this out in in like two lines." Ignoring this many dependant modules and literally thousands of lines of code :P `my $html = 'url: <a href="http://example.com">http://example.com</a>'; my $dom = Mojo::DOM->new( $html ); say $dom->at('a')->remove;` [download]	[reply] [d/l]
Re^10: Looking for a module that strips an HTML tag and its associated 'TEXT' by nysus (Parson) on Jul 29, 2020 at 15:08 UTC
Re^11: Looking for a module that strips an HTML tag and its associated 'TEXT' by marto (Cardinal) on Jul 29, 2020 at 15:31 UTC
Some notes below your chosen depth have not been shown here
Re^11: Looking for a module that strips an HTML tag and its associated 'TEXT' by marto (Cardinal) on Jul 29, 2020 at 16:19 UTC
Re^10: Looking for a module that strips an HTML tag and its associated 'TEXT' by nysus (Parson) on Jul 29, 2020 at 15:27 UTC
Re^11: Looking for a module that strips an HTML tag and its associated 'TEXT' by marto (Cardinal) on Jul 29, 2020 at 15:37 UTC


Perl: the Markov chain saw
	PerlMonks