http://qs321.pair.com?node_id=146186


in reply to shameful reg expression

You should give HTML::Parser module a try. Otherwise, I don't know what your regex looks like (where you're capturing the text), but you might try placing into your existing regex code
([^><]+)
which captures a string of characters that do NOT match either > or <. Keep in mind, however, that this should be placed correctly in the regex, otherwise it could also give you matches like "td" and "b".

But again, this may not necessarily work in all situations, even if you write a damned good regex. For best results, look into HTML::Parser.