good chemistry is complicated, and a little bit messy -LW |
|
PerlMonks |
Re: Keeping bad HTML badby adrianh (Chancellor) |
on Aug 24, 2002 at 20:35 UTC ( [id://192583]=note: print w/replies, xml ) | Need Help?? |
You're going to have problems with HTML parsers - since, as everybody has pointed out, it's not really HTML. If you are in a position where you cannot force who/whatever is producing the broken HTML to stick to standards the easist alternative is to treat it as a string or a sequence of tags rather than a tree structure. I had a similar problem several years back, which I resolved by simply adding special comments around the content that the user had to edit. Something like:
The "editable" stuff could then be extracted with some simple regexes. Without some more info on what kind of transformations you're trying to apply to the source it's a little difficult to give more specific advice. Can you give us more of an idea of what you're trying to do?
In Section
Seekers of Perl Wisdom
|
|