BBCode parser/validator needed

by OnionKnight (Sexton)
on Sep 03, 2005 at 07:26 UTC

OnionKnight has asked for the wisdom of the Perl Monks concerning the following question:

I want to make some sort of subroutine which parses BB tags but I'm not sure how to do it. Replacing them with html isn't hard but what if someone opens a tag and doesn't close it? (i.e [/B]) I could count the number of opened tags and closed and check if they match and if they don't I'll print an error message, or correct it.
Problem is that regexes return true or false in a scalar context - which I don't want, and list of captured stuff in array context. I could capture all tags, assign them to a list and check it but don't you guys usually rave about capturing being really slow for performance?

Also, I am using >>\d+ to reference to other posters, for example >>1 is a reference to the first poster. But I don't want this substitution (putting it in a <a> tag) to be done inside a [code] tag so how do I do this? I was thinking of doing some look-before/look-ahead thing like /(?<!\[code\])>>\d+(?!\[\\code\])/ but will that work if a user for example were to write:
'[code]print "funny text";[/code] >>4 blargh [code]5>>1 is 2[/code]'

Favorable output would be:
'<span class="code">print &quot;funny text&quot;;</span><a href="#4">&gt;&gt;4</a><span class="code">5&gt;5&gt;1 is 2</span>'

But I suspect that the ">>4" won't get substitued with an <a> tag. (Entity names are already being taken care of with escapeHTML so don't worry about that)

Also, is it possible to get all this done in XHTML (e.g. tags strictly close in a reverse way they're opened) without too much work?

2005-09-03 Retitled by Arunbear, as per Monastery guidelines
Original title: 'BBCode'
Original title: 'BBCode'

Replies are listed 'Best First'.
Re: BBCode parser/validator needed
by Roger (Parson) on Sep 03, 2005 at 07:46 UTC
    You could try the BBCode::Parser module from CPAN that seems to do what you want. Also if you want XHTML parser, there are quite a few on CPAN as well.

    Rule of thumb - always check CPAN first to see if there is already a module to do what I want. The chances are that it already exists.

      ... or HTML::BBCode ;-)


      All code is usually tested, but rarely trusted.

