good chemistry is complicated, and a little bit messy -LW |
|
PerlMonks |
Re: Cleaning up HTMLby clinton (Priest) |
on Dec 23, 2007 at 12:24 UTC ( [id://658772]=note: print w/replies, xml ) | Need Help?? |
It is certainly not smaller, and is probably slower (haven't done any benchmarks), but it is very flexible and powerful: try using HTML::StripScripts via HTML::StripScripts::Parser. This will churn through your HTML (either a complete HTML page or an HTML snippet), tidy up the HTML, fix tag nesting, remove scripts, remove unknown attributes etc. Through the Rules => {} parameter, you can specify exactly what tags and attributes you want to allow through, adding regexes or callbacks to customise the results.
Clint Update: I'm the maintainer of HTML::StripScripts, and I added the Rules => {} parameter to HTML::StripScripts, which makes it easier to customise. But all credit for the underlying module must go the Nick Cleaton, the original author, who did a very very good job indeed Update: Tidied up the HTML
In Section
Seekers of Perl Wisdom
|
|