Perl-Sensitive Sunglasses | |
PerlMonks |
Tag filtering: a standard mechanism?by thpfft (Chaplain) |
on Sep 13, 2001 at 06:13 UTC ( [id://112088]=perlquestion: print w/replies, xml ) | Need Help?? |
thpfft has asked for the wisdom of the Perl Monks concerning the following question: I'm getting that 'look! I made a wheel!' feeling again. I've just put this in a script, not for the first time: my $text = "<p>" . join('</p><p>', split(/[\r\n]{2,}/,$self->body_text())) . "</p>";to create paragraphs in the body text of records coming out of a database, of course. Which everyone has to do. And then i'm going to start thinking about links, and then subheadings, and soon i'm going to find myself inventing a simple markup language that my users will be able to handle, and we all know that's a Sin. I've dug around in cpan and here looking for something that will help me to format text nicely but still protect users from themselves and one another, and I've turned up very little. There's a lot of discussion, but not much code. What i really need, and i assume the 47 million other people who write content-management systems or bulletin boards also need, is a standard and reasonably slim way of filtering input and output so that an arbitrary collection of html tags are allowed through but everything else is removed. If it can detect non-marked-up text and act appropriately, that's a bonus. HTML::FromText does a good job of deducing markup from text formatting, but it doesn't handle headings very flexibly, or anything to do with document structure or linking. HTML::Filter - which is deprecated anyway- removes selected content between tags rather than the tags themselves. HTML::Parser is overkill here. HTML::SimpleParse is closest, but still at a bit of a tangent and rather overqualified for the job. It's not hard to see how I could do this to my own requirements. Everything and Slashcode do it essentially the same way - a two step regex-based process of stripping out all but the allowed tags and stripping out all but the allowed attributes from those that remain. But this seems like a simple thing that ought to exist, and things like that usually turn out to be on cpan somewhere. So, er, does it exist? Is anyone working on it? Should I try to write it? Is it possible to make it watertight? Should I just use HTML::Parser and shut up? For building it my first choice would be to try subclassing or developing from HTML::SimpleParse, but it would have to be configurable from the outside, and very tolerant of broken markup, or at least able to return a useful complaint. Second choice would be to encapsulate the Everything approach. Any comments please? A slap and a 'use HTML::Foo' would make me very happy...
Back to
Seekers of Perl Wisdom
|
|