Re: Strip HTML tags again


No such thing as a small change
	PerlMonks

Re: Strip HTML tags again

by Ovid (Cardinal)

on Jun 30, 2002 at 20:20 UTC ( [id://178410]=note: print w/replies, xml )

Need Help??

in reply to Strip HTML tags again

This problem looks tailor-made for my HTML::TokeParser::Simple module, when combined with HTML::Tagset. The following test will demonstrate:

#!/usr/bin/perl -w
use strict;
use HTML::TokeParser::Simple;
use HTML::Tagset;

my $html = <<'END_HTML';
<a href="mylink">text1</a>
<this is normal text>
END_HTML

my $p = HTML::TokeParser::Simple->new( \$html );

while ( my $token = $p->get_token ) {
    next if ! $token->is_text 
              and 
              exists $HTML::Tagset::isKnown{ $token->return_tag };
    print $token->return_text;
}
[download]

Result:

text1
<this is normal text>

Cheers,
Ovid

Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

Comment on Re: Strip HTML tags again Download Code

Replies are listed 'Best First'.

Re^2: Strip HTML tags again
by Your Mother (Archbishop) on Mar 05, 2009 at 01:02 UTC

++ for the original. I'm posting an updated example because some changes to the module seem to have borked your example. This is an in place stripper--based on the one you posted--with the newer/working syntax.

sub strip_html {
    my $renew = "";
    my $p = HTML::TokeParser::Simple->new(\$_[0]);
    no warnings "uninitialized";
    while ( my $token = $p->get_token ) {
        next if ! $token->is_text
            and
            exists $HTML::Tagset::isKnown{ $token->get_tag };
        $renew .= $token->as_is;
    }
    $_[0] = $renew;
}
[download]

[reply]
[d/l]

In Section Seekers of Perl Wisdom

Domain Nodelet^?

www.com | www.net | www.org

Node Status^?

node history
Node Type: note [id://178410]
help

Chatterbox^?

How do I use this? • Last hour • Other CB clients

Other Users^?

Others perusing the Monastery: (5)

As of 2024-04-25 17:05 GMT

Sections^?

Information^?

Find Nodes^?

Leftovers^?

Today I Learned

Voting Booth^?

No recent polls found