Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re: Parsing XML with XML::Simple

by ferreira (Chaplain)
on Dec 18, 2006 at 00:54 UTC ( [id://590366]=note: print w/replies, xml ) Need Help??


in reply to Parsing XML with XML::Simple

The problem is that you don't have a well-formed XML file. If it were well-formed,
there would be a root element, which is the ancestral of every other ones. Something like this:

<root> <CVS> $Id: File_Find.pl,v 1.1 2006-12-17 19:25:03 eric Exp $ </CVS> <DATE>2006-12-10</DATE> ... <ARTICLE> foo bar baz </ARTICLE> </root>

I think if you do this simple correction, XML::Simple will work right for you. And, by the default,
this root element disappears, so that you'll get at the first level a hash with the keys you want:
CVS, ARTICLE, DATE, etc.

You could always try your XML files against a typical browser (like FireFox, Opera, IE, etc.)
to see if they are well-formed or if some error is pointed.

Replies are listed 'Best First'.
Re^2: Parsing XML with XML::Simple
by madbombX (Hermit) on Dec 18, 2006 at 01:03 UTC
    You hit the nail on the head. Everything I had worked just fine. The problem is that when I ran it through Firefox, I noticed that a few points throughout the articles, I have the author emails in the following format:
    First Last <this@that.com>

    That messed everything up and all my original code actually works. Is there a way around that using XML::Simple or XML::Twig so that I don't have to go through EVERY file and remove all instances of that?

      The big issue is that if you have First Last <this@that.com> within your XML, you have bad XML. (It should be First Last &lt;this@that.com&gt;.) It is better to fix these files. I am not sure how you came into this, because XML::Simple usually escapes these things:
      $ perl -MXML::Simple -e "print XMLout({ a => 'a <b>' })" <opt a="a &lt;b&gt;" />

      It could be the version you're using. The example above used

      $ which_pm XML::Simple XML::Simple 2.13 c:/tools/apache/Perl/site/lib/XML/Simple.pm
        I am using XML::Simple version 2.14. I came into this because part of my company policy requires us, in our CVS headers to have that line as part of the template for documents/scripts/etc that go into CVS. Therefore, the XML files that are turned into articles all have those within the top 5 lines. Neither XML::Simple or XML::Twig handle this properly.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://590366]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (4)
As of 2024-04-26 00:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found