And more importantly, he knows the XML generators whose output he is dealing with, which means he doesn't have to account for all plausible cases — only those the generators he is dealing with take advantage of.
If you want to deal with XML in the general case, then you do have to parse, no way around it.
Makeshifts last the longest.
| [reply] |
| [reply] |
Of course you can parse using regular expressions. You just shouldn't grope around in a string representing an XML document using regular expressions, because you have to be certain about the context in which any match occured. That means you have to scan the string strictly front-to-back, probably using the /gc options and the \G anchor to make sure you don't miss anything. Simply picking matches out of the middle of the string is very likely to be a broken approach unless you are dealing with a known subset of XML syntax.
Makeshifts last the longest.
| [reply] |
I've done it as well, but I've been using Perl to parse HTML since 1994 and XML for four years. In those cases, I also have the file creator within spitting distance, "and he was a poor spitter, lacking both distance and control"(*), so I could literally beat any them over the head if I wanted to. If I didn't have a lot of experience I'd never do it, and if the file provider isn't within strangling distance, I go with CPAN modules.
In short, you can do it, but you probably shouldn't do it.
(*) - P.G. Wodehouse, Money in the Bank
--
tbone1, YAPS (Yet Another Perl Schlub)
And remember, if he succeeds, so what.
- Chick McGee
| [reply] |