Re: xml parsing without using cpan modules

Replies are listed 'Best First'.
Re^2: xml parsing without using cpan modules by Aristotle (Chancellor) on Aug 10, 2004 at 15:30 UTC
And more importantly, he knows the XML generators whose output he is dealing with, which means he doesn't have to account for all plausible cases — only those the generators he is dealing with take advantage of. If you want to deal with XML in the general case, then you do have to parse, no way around it. Makeshifts last the longest.	[reply]
Re^3: xml parsing without using cpan modules by iburrell (Chaplain) on Aug 10, 2004 at 19:07 UTC
It is possible to write an XML parser using regular expressions. Check out "REX: XML Shallow Parsing with Regular Expressions", http://www.cs.sfu.ca/~cameron/REX.html. It even has Perl code for doing. It effectively splits the XML into a list of strings on logical boundaries by repeating a regular expression that matches XML markup. It is fairly easy to find the type of each chunk by looking at the first couple of characters.	[reply]
Re^4: xml parsing without using cpan modules by Aristotle (Chancellor) on Aug 10, 2004 at 19:22 UTC
Of course you can parse using regular expressions. You just shouldn't grope around in a string representing an XML document using regular expressions, because you have to be certain about the context in which any match occured. That means you have to scan the string strictly front-to-back, probably using the `/gc` options and the `\G` anchor to make sure you don't miss anything. Simply picking matches out of the middle of the string is very likely to be a broken approach unless you are dealing with a known subset of XML syntax. Makeshifts last the longest.	[reply]
Re^5: xml parsing without using cpan modules by iburrell (Chaplain) on Aug 10, 2004 at 19:38 UTC
Re^5: xml parsing without using cpan modules by Nalina (Monk) on Aug 11, 2004 at 06:29 UTC
Re^6: xml parsing without using cpan modules by mirod (Canon) on Aug 11, 2004 at 12:39 UTC
Re^2: xml parsing without using cpan modules by tbone1 (Monsignor) on Aug 10, 2004 at 15:46 UTC
I've done it as well, but I've been using Perl to parse HTML since 1994 and XML for four years. In those cases, I also have the file creator within spitting distance, "and he was a poor spitter, lacking both distance and control"(), so I could literally beat any them over the head if I wanted to. If I didn't have a lot of experience I'd never do it, and if the file provider isn't within strangling distance, I go with CPAN modules. In short, you can* do it, but you probably shouldn't do it. () - P.G. Wodehouse, Money in the Bank* -- tbone1, YAPS (Yet Another Perl Schlub) And remember, if he succeeds, so what. - Chick McGee	[reply]


go ahead... be a heretic
	PerlMonks