Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??

Brief synopsis: I'm pushing a bunch of unstructured data into an API call that returns an XML page. Here's a snipet:

<?xml version="1.0" encoding="UTF-8"?> <results> <url>--removed--</url> <language>english</language> <text>---removed---</text> <taxonomy> <category> <label>/vehicle brands/jeep</label> </category> <category> <label>/travel</label> </category> </taxonomy> <keywords> <keyword> <text>rear extended bumpstops</text> </keyword> </keywords> </results>

So far, I've been using XML::Simple to strip the header, and I've been trying to parse the data as such:

my $xml = new XML::Simple (KeyAttr=>[]); my $TopList = $xml->XMLin($result); $SQL = "INSERT INTO categories (category, url) VALUES(?, ?)"; $SQLX = $dbh->prepare($SQL); if ($TopList->{taxonomy}) { foreach my $cat (@{$TopList->{taxonomy}->{category}}) { $SQLX->execute($cat->{label}, $db_url); } $SQLX->finish(); }

The difficulty is that the XML output is unpredictable. I may not have ANY <category> entries, based on the data. And sometimes, the output comes at me with a 2-word 'keyword' formation, or a hyphenated value. So essentially, I have TWO problems / questions:

1) I need to make a valid check to see if there is actually an entry for the subheading I'm looking for, such as the line above:

if ($TopList->{taxonomy})

This line has never barfed at me, but the next line has, so I need to know if I'm going this check correctly. Reading part 2 might give you a bit more context for this part of my question...

2) The next line in the code barfs at me quite a bit, where I dereference to drill down:

foreach my $cat (@{$TopList->{taxonomy}->{category}})

Sometimes it barks that it's not a Hash, so I change it, and then it barks that it's not an array. During a chat earlier, I was told this is a common problem with XML::Simple. It was suggested that I use XML::Twigs, or ForceArray. I originally was asking if I could simply use an else clause with my foreach statement. Something like:

foreach $var (@{$TopList->{taxonomy}->{category}}) {do something;} else foreach $var ($TopList->[taxonomy}->{category}) {do the something this way;}

The more I think about that else clause, the less it makes sense, but the more I think it would be a quick fix to my problem (thus, further cementing the idea that it won't work). Either way, I haven't gotten the syntax to work yet, so I thought I would ask:

How would the Perl Monks do this?

I should also add, that I'm on a very tight deadline, so quick and simple is better than complicated but elegant.

Help me, Perl Monks, You're My Only Hope!


In reply to Help with Parsing XML output by khrome

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others wandering the Monastery: (6)
    As of 2021-04-19 12:47 GMT
    Find Nodes?
      Voting Booth?

      No recent polls found