Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re^8: Looking for a module that strips an HTML tag and its associated 'TEXT'

by nysus (Parson)
on Jul 29, 2020 at 14:40 UTC ( #11119982=note: print w/replies, xml ) Need Help??


in reply to Re^7: Looking for a module that strips an HTML tag and its associated 'TEXT'
in thread Looking for a module that strips an HTML tag and its associated 'TEXT'

I looked at Mojo::Dom briefly. It's a general purpose tool. Was hoping for a module that let me knock this out in in like two lines. Mojo::Dom is my fallback plan if I can't find what I'm looking for.

$PM = "Perl Monk's";
$MCF = "Most Clueless Friar Abbot Bishop Pontiff Deacon Curate Priest Vicar";
$nysus = $PM . ' ' . $MCF;
Click here if you love Perl Monks

  • Comment on Re^8: Looking for a module that strips an HTML tag and its associated 'TEXT'

Replies are listed 'Best First'.
Re^9: Looking for a module that strips an HTML tag and its associated 'TEXT'
by marto (Cardinal) on Jul 29, 2020 at 14:57 UTC

    "Was hoping for a module that let me knock this out in in like two lines."

    Ignoring this many dependant modules and literally thousands of lines of code :P

    my $html = 'url: <a href="http://example.com">http://example.com</a>'; my $dom = Mojo::DOM->new( $html ); say $dom->at('a')->remove;

      Almost, not quite. Need this:

      # strips a specific tag from string sub eliminate_tags { my ($page, $tag) = @_; my $dom = Mojo::DOM->new; foreach my $b ($dom->parse($page)->find($tag)->each) { $b->remove } return $dom; }

      So my beef it that: 1) I'd have to be familiar enough with Mojo::Dom to figure out it could do this (I'm not) so I needed to come PerlMonks to find someone like you to help and 2) I have to spend 20 min. wading through monstrous documentation to figure out how to use it for something simple.

      So why isn't a specific tool better than a general purpose tool? You're saying a specific tool is inferior because it has more dependencies?

      $PM = "Perl Monk's";
      $MCF = "Most Clueless Friar Abbot Bishop Pontiff Deacon Curate Priest Vicar";
      $nysus = $PM . ' ' . $MCF;
      Click here if you love Perl Monks

        Sometimes you may have to write some code yourself... I posted a general example, given your example data, simply to show you how it can be done. The place is for helping people learn, people aren't always going to do it all for you. If you are writing code, reading the documentation for the tools you are going to use is literally the bare minimum you can do. 20 minutes to learn how to use something as portable and powerful as this seems insignificant compared to the productivity gains, and the alternative of writing all of the required code to do this properly yourself. If you think this is "something simple" you don't understand the scope of the problem at all. Look what you have achieved with this in 20 minutes! What is "monstrous" about the Mojo::DOM docs? It's littered with examples for how to use each method practically, with before and after data displayed.

        "So why isn't a specific tool better than a general purpose tool? You're saying a specific tool is inferior because it has more dependencies?"

        You tried two, neither of which did do what you imagined they would, and seemingly didn't take into account their dependencies. Having spent a long time working with perl and data, solving problems and writing code I suggested what I've found to be a tried and tested method that takes the pain out of what you're trying to do, in this instance. I didn't say anything about any other specific tools or problems.

        If you want to reduce the number of lines a little:

        my $dom = Mojo::DOM->new( $html ); $dom->find( $tag )->each( sub { $_-> remove } ); return $dom;

      Would it make sense, in your opinion, to create new module that is just a wrapper for Mojo::DOM with a specific use case of deleting nodes with a certain tag and submitting to cpan?

      $PM = "Perl Monk's";
      $MCF = "Most Clueless Friar Abbot Bishop Pontiff Deacon Curate Priest Vicar";
      $nysus = $PM . ' ' . $MCF;
      Click here if you love Perl Monks

        If it's something you'd find useful, a library you could use from other perl code, along with a command line script that calls this module could be useful to some, if you find it useful, it may be worth sharing, provided you don't mind the personal overhead of everything (time etc.) that goes along with it. A reverse dependency lookup shows that people are replying on Mojo::DOM to munge data, but they don't seem to be as general usage as the thing you're suggesting.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11119982]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (3)
As of 2021-10-23 21:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My first memorable Perl project was:







    Results (88 votes). Check out past polls.

    Notices?