Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re: XML::Simple giving a non-specific error

by almut (Canon)
on Mar 12, 2010 at 00:11 UTC ( [id://828166]=note: print w/replies, xml ) Need Help??


in reply to XML::Simple giving a non-specific error

I think that from a parser's point of view there isn't really much more that could be done (except maybe reporting the tag name).  Unless you have a DTD that disallows <ERROR> tags to be nested etc. (and XML::Simple would actually do validation), the parser cannot tell that there's an error (i.e. unclosed <ERROR> tag) before having reached the end of file.

Update: I've never used the module myself, but maybe XML::Simple::DTDReader can help with the issue...

  • Comment on Re: XML::Simple giving a non-specific error

Replies are listed 'Best First'.
Re^2: XML::Simple giving a non-specific error
by ikegami (Patriarch) on Mar 12, 2010 at 00:21 UTC

    You seem to be saying that being unable to determine that an error occurred before having reached the end of file means it can't be reported accurately. That's not the case, as seen in the update to my post.

    By the way, the error wasn't reported at the end of the file, it was reported when the closing tag of the parent element (</ROOT>) was found.

      By the way, the error wasn't reported at the end of the file

      Judging by the byte position (377), it was (the closing angle bracket of </ROOT> is byte 375(*)).  I don't know why the line number is reported one less than it should be — maybe the <?xml ...?> header isn't being counted.

      As for your other point, I think you're right if the parser would keep track of all starting positions of so far unclosed tags.

      ___

      (*) assuming unix newlines, which I did after having seen i386-linux-thread-multi in the OP's error message.

        If a guy catches the baseball at the edge of the outfield, it's not the edge of the outfield that caught the ball. Aside from the fact that it really was found before EOF (since at least the last newline and the EOF remain unparsed), the point was that the error could have been caught earlier, and would have been caught earlier (say if you had <ROOT><BODY><ERROR><ERROR></BODY></ROOT>).

        Besides, the following indicates the reported byte pos for me:

        </ROOT> ^ |
      That's not the case, as seen in the update to my post.

      Your update shows how LibXML, a parser which builds a tree (takes more memory), can provide better error messages than a simpler parser like expat.

        You seem to be implying that the fact that it builds a tree is relevant (if XML::LibXML::SAX even builds a tree). It's not. To be able to provide the error message it already provides, the parser needs a list of unclosed elements.
        my @unclosed = ( 'ROOT', 'ERROR', 'ERROR', );
        All that's needed to provide a better error message is to note a line number along with the name of the element.
        my @unclosed = ( [ 'ROOT', 3 ], [ 'ERROR', 8 ], [ 'ERROR', 8 ], );

        Yes, it uses extra memory, but 1) it doesn't add to the magnitude (O()) of the memory used, 2) the maximum used is proportional to the depth of tree and they're usually quite shallow (20?).

        As for expat being simpler, its actually almost identical to SAX. It wouldn't surprise me if one inspired the other.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://828166]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (4)
As of 2024-04-25 09:17 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found