Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Valid XML and XML::Simple

by mrguy123 (Hermit)
on Oct 28, 2007 at 09:01 UTC ( [id://647674]=perlquestion: print w/replies, xml ) Need Help??

mrguy123 has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks,
I have been using the Perl Module XML::Simple quite a lot recently to parse XML and to test if XML is valid and well formed. Everything works great except for one problem.
Sometimes the XML I need to parse has ":" inside the tags, according to certain namespaces. For example
<person> <person:name>Joe</person:name> <person:job>programmer</person:job> </person>
This is not valid XML according to W3C standards. If you save this as an XML file, and try to open it, you get an error. Also, if you go to http://www.w3schools.com/dom/dom_validate.asp and try to validate the XML, you get this error: "reference to undeclared namespace 'person'". My problem is that XML::Simple does not consider this XML invalid. For example, this code does not return an error:
#!/exlibris/metalib/m4_b/product/bin/perl use strict; use XML::Simple; my $source_code = "<person> <person:name>Joe</person:name> <person:job>programmer</person:job> </person>"; my $xs = new XML::Simple(); my $hash; ##This should return an error!! eval {$hash = $xs->XMLin($source_code)}; if ($@){ print "$@"; exit(0); }
Since I am now working on a project to transform invalid XML to valid XML, and am also using a C XML parser that returns an error for this sort of XML, I have a problem. I know how to make the XML valid, but I also want XML::Simple to fail if the XML is invalid. Does anyone have any ideas what to do?
Thanks,
Guy

---A truth that's told with bad intent beats all the lies you can invent
Update: Fixed typo XML::simple -> XML::Simple

Update 2: Added an example from w3schools

Replies are listed 'Best First'.
Re: Valid XML and XML::Simple
by GrandFather (Saint) on Oct 28, 2007 at 09:41 UTC

    Where does w3c say that that is not valid XML? The XML spec in part says:

    [4] NameChar ::= Letter | Digit | '.' | '-' | '_' | ':' | CombiningChar | Extender
    [5] Name ::= (Letter | '_' | ':') (NameChar)*

    and:

    The Namespaces in XML Recommendation XML Names assigns a meaning to names containing colon characters. Therefore, authors should not use the colon in XML names except for namespace purposes, but XML processors must accept the colon as a name character.

    which makes it pretty clear that colons not only are allowed, but must be accepted by XML processors.


    Perl is environmentally friendly - it saves trees
      Hi,
      I'm assuming this is not black and white, but the fact is that if you save my example as an XML file, and try to open it, you get an error. It also fails when I try to parse it in "C".
        The foo:thing syntax relates to namespaces. An XML parser must accept such syntax (here the node is a 'thing' in the 'foo' namespace).

        The document should be considered invalid because the 'person' namespace has not been declared at the top of the XML document.

        However, XML::Simple is (as it's name implies) a simplistic approach to parsing XML... it's not exactly a strict parser. It may or may not give you a warning about a missing 'prefix'.

        Use a more compliant parser such as XML::Parser (directly), XML::Twig or XML::LibXML.

        -David

      Hi again
      I added this in the decription: if you try to validate this XML in the w3schools page (http://www.w3schools.com/dom/dom_validate.asp) you get an error
Re: Valid XML and XML::Simple
by pKai (Priest) on Oct 28, 2007 at 12:24 UTC

    I can confirm your symptom (i.e. "no error") when I force XML::Simple to use XML::Parser as its parser. Maybe your configuration uses XML::Parser as its default? Consult the "ENVIRONMENT" section in XML::Simple's documentation.

    If I let XML::Simple use XML::SAX as parser (which is my default, since it is installed), I get

    Undeclared prefix: person at E:/perl/site/lib/XML/NamespaceSupport.pm +line 298.

    Which is "somewhat" similar to the message Anonymous Monk reports in the first reply in this thread (but who -- judging from his message -- uses some other parser).

      Thanks for the tip. I am trying to force XML::SAX, but am getting this error
      Can't locate object method "new" via package "XML::SAX" at /exlibris/p +roduct/perl-5.8.7/lib/site_perl/5.8.7/XML/SAX/ParserFactory.pm line 4 +3.
      I will keep working on it
        perldoc XML::SAx NAME XML::SAX - Simple API for XML SYNOPSIS use XML::SAX; # get a list of known parsers my $parsers = XML::SAX->parsers(); # add/update a parser XML::SAX->add_parser(q(XML::SAX::PurePerl)); # remove parser XML::SAX->remove_parser(q(XML::SAX::Foodelberry)); # save parsers XML::SAX->save_parsers();
Re: Valid XML and XML::Simple
by Anonymous Monk on Oct 28, 2007 at 09:32 UTC
    Check your vision Can't locate object method "new" via package "XML::simple" (perhaps you forgot to load "XML::simple"? If I fix that, I get
    unbound prefix at line 2, column 3, byte 11
      Sorry, it's supposed to be XML::Simple
        XML Parsing Error: prefix not bound to a namespace Line Number 2, Column 3: <person:name>Joe</person:name> --^
        Keep reading!
        If I fix that, I get
        unbound prefix at line 2, column 3, byte 11

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://647674]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (3)
As of 2024-04-25 19:34 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found