Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

The title of your post should be YA piece of code that parses a limited and undefined subset of XML.

XML parsers can be validating (against a DTD) or not (they just test well-formedness), see the XML spec. But a non-validating parser still must cover the whole spec. Nothing is optional in the XML spec. To claim to be an XML parser your code must be able to deal with entities, CDATA sections, comments, PI's...

A couple more cases that your code does not handle:

  • <doc att="toto>"/>: yes > is legal in a tag, if it is in an attribute value,
  • <doc><section><section></section></section></doc>: nested tags are perfectly legal and widely used, so you can't use a regexp like <tag>.*?</tag> like you do to match an entire element.

You should really study XML::SAX::PurePerl for a real XML parser in pure Perl (XML::SOAP::Lite does not parse all of XML, just the subset used by SOAP).

Once again: writing something that parses the specific XML data you have to deal with at the moment is usually quite easy, but writing a real XML parser is _HARD_. You can of course use this parser for your data, but try to remember that it does not parse all of XML, and do not complain if one day it breaks on perfectly valid XML.

And please do not post the code and above all do not pretend it is an XML parser. You are doing a disservice both to other monks and to you.

OK, I think I am done. I will now take off my XML ayatollah hat and resume my normal activities ;--)


In reply to Re: (YA) Perl XML parser by mirod
in thread (YA) Perl XML-like parser by belg4mit

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (2)
As of 2024-04-19 21:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found