I have to give a big ++ to Corion for this advice. If you have malformed HTML, running it through Tidy will definately make it far more useable. Although there is currently not a Perl implementation of it (WHAH!), it is very easy to incorporate via a Perl system call. If you have a lot of pages to process, you can build a Perl looping structure and process them one after another. If this is part of an inline process, you can run each file through before you Parse or do whatever with it. I'm currently implementing such an inline Tidy & Perl HTML::Parser process into an existing PHP process. If you have any question, feel free to contact me.
-THRAK
www.polarlava.com