http://qs321.pair.com?node_id=46601

After reading a recent question, but also some older onesI thought it would be worth mentionning the basic rule of XML processing: Use a parser!

As I know you won't take my word for it I will give you just a couple of examples of things that might (that will) go wrong if you use plain regexps:

Not to mention the usual kind of problem with evolving XML, when the content of the tag element starts including additional mark-up, when the tag element gets a bunch of attributes, or when tag2 elements start popping up in between tag elements.

You might think that you don't care about all of those, your XML is simple and you don't need no stinkin' namespaces. WRONG! You are limiting yourself to a subset of XML, but you are NOT calling it a subset. And either you or (pity them!) the people who will maintain your code won't remember that it is only a subset, and what subset. Plus you might have total control over this pseudo-XML today but tomorrow? Maybe you will receive it from some external source, or you will use an off-the-shelf tool to create it.

Plus those extra features that your lovingly crafted regexps don't grok might come in handy in the future, will you add them to your software? Will you end up writing your own regexp-based parser? It has been done by the way, it's just that XML::Parser is faster for non-trivial XML, and I happen to trust James Clark more than myself when it comes to writing a parser.

So please, anytime you want to process XML, especially if the software is going to be used for a while, please,

Use the Parser Luke!