|Welcome to the Monastery|
Ensuring HTML is "balanced"by skx (Parson)
|on Mar 06, 2006 at 20:22 UTC||Need Help??|
skx has asked for the wisdom of the Perl Monks concerning the following question:
I'm interested in modifying user submitted HTML, such that all tags are balanced.
eg "<b><i>test</b>" is obviously broken HTML.
I realise I can do simple cases with regexps, but to do it properly I probably want to use HTML::Treebuilder, or similar.
The problem is I'm not 100% sure how to start. I can certainly keep a stack of opened tags, and know when something is broken. But pushing the closures on in the right order is a bit tricky.
Suprisingly CPAN didn't seem to have anything to offer when I searched for terms such as 'html balance', so if there is existing code I've not found it.