Re: Re: Strip HTML tags again

Ok, I'll try to explain. If someone types '<b>some text</b>', it should not be be displayed as a bold text in chat window, and displaying the HTML source is not a good idea too. I just have to strip all tags from the line.

And '<some text>' should not be stripped. All regexp-based solutions will strip this too. I'm still looking for regexp-based stuff that will use %HTML::Tagset::isKnown hash to filter out only correct HTML tags.

--dda

Comment on Re: Re: Strip HTML tags again

Replies are listed 'Best First'.
Re: Re: Re: Strip HTML tags again by little (Curate) on Jul 01, 2002 at 12:55 UTC
look up the POD (or your preferred docs) for HTML::Tagset cite: "hashset %HTML::Tagset::isKnown This hashset lists all known HTML elements." So you've got to compare your match with that list ... Have a nice day All decision is left to your taste Addendum Look through the previous suggestions as well. Try it at least and ask again if you get an error or get otherwise stuck. :-)	[reply]
Re: Re: Re: Re: Strip HTML tags again by dda (Friar) on Jul 01, 2002 at 13:01 UTC
The problem is how to extract 'my match' from the regexp shown earlier (or other - please suggest one).. I know about that hashset, and what I need is to apply it to my sub. --dda	[reply]
Re: Re: Re: Re: Re: Strip HTML tags again by amphiplex (Monk) on Jul 01, 2002 at 13:18 UTC
Hi ! I think this does what you want: `use HTML::Tagset; my %tags = %HTML::Tagset::isKnown; my $tagpattern = "(".join('\|',keys %tags).")"; print STDERR "$tagpattern\n"; while (<>) { print strip_html_tags($_); } sub strip_html_tags { my $line = shift; $line =~ s/<\s$tagpattern(?:\s>\|\s+[^>]>)([^<])<\s\/\1[^>]>/$2 +/ig; return $line; }` [download] I first create the string $tagpattern by putting a "\|" between all known HTML tags and surrounding the whole thing with parantheses. This will give something like "(a\|p\|code.....)" and is used later in the subroutine to check for valid HTML tags. The regex looks a bit complicated and I am sure that it can be written much better, but I believe it is sufficient for your cause. Note that this will only work for tags that are on one line and could get you into trouble if there are < or > signs inside a tag (Don't know if this is possible in HTML). update: It would propably be a lot wiser to use Ovid's code then my homegrown regex. ---- kurt	[reply] [d/l]
Re: Re: Re: Re: Re: Re: Strip HTML tags again by dda (Friar) on Jul 01, 2002 at 13:38 UTC
Re: Re: Re: Re: Re: Strip HTML tags again by little (Curate) on Jul 01, 2002 at 13:15 UTC
Did you look further than ides' suggestion? Did you try Ovid's suggestion? </code> Have a nice day All decision is left to your taste	[reply]
Re: Re: Re: Re: Re: Re: Strip HTML tags again by dda (Friar) on Jul 01, 2002 at 13:36 UTC


"be consistent"
	PerlMonks

Re: Re: Strip HTML tags again

Addendum

update: