http://qs321.pair.com?node_id=861995


in reply to Dynamically cleaning up HTML fragments

Glad to see that you have noticed HTML::StripScripts::Parser. I'm the maintainer, but not the guy who did the great work of writing it originally.

It fulfils all of your listed requirements, and is certainly seeing active usage on our production sites.

This code should do what you need (untested):

my $s = HTML::Stripscripts::Parser->new({ Context => 'Flow', # Only allow these tags BanAllBut => [qw(p a img h3 div em)], # Allow src and href AllowSrc => 1, AllowHref => 1, Rules => { # remove empty p tags p => sub { return length $_[1]->{content} }, # a must have a local href a => { href => \&strip_abs_uri, tag => sub { return 0 unless $_[1]->{href} }, }, # img must have a local src img => { src => \&strip_abs_uri, tag => sub { return 0 unless $_[1]->{src} }, }, # Allow id and class for all tags '*' => { id => 1, class => 1, } }, }); sub strip_abs_uri { my ( $filter, $tag, $attr_name, $attr_val ) = @_; return 1 unless $attr_name =~/href|src/ return $attr_val=~m{://}; } print $s->filter_html($html);