Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

parsing xml file using LIBXML

by Madam (Sexton)
on May 12, 2005 at 06:33 UTC ( [id://456261]=perlquestion: print w/replies, xml ) Need Help??

Madam has asked for the wisdom of the Perl Monks concerning the following question:

I have a xml file,recipe.xml which i need to parse and i am using "LIBXML" parser.
<?xml version="1.0" encoding="ISO-8859-1" ?> - <recipe name= "Chocolate Chip Bars"> <optional>vanilla,mango,nuts</optional> <step name="Preheat oven to 350 degrees"/> <step name="Melt butter" /> <step name="combine with brown sugar and vanilla in large mixing bow +l"> <dependency use="vanilla"/> </step> <step name="combine with brown sugar and vanilla,nuts in large mixi +ng bowl"> <dependency use="vanilla,nuts,!mango"/> </step> <step name="combine with brown sugar and mango in large mixing bowl +"> <dependency use="mango,!vanilla,!nuts"/> </step> ... </recipe> <recipe name="chocolate cake"> ....
This is the xml which i have taken and it is just an example and it is simliar to the one which i am using.i have an executable by name 'recipe'. so if i run 'recipe -name Chocolate Chip Bars" it will select the recipe name "Chocolate Chip Bars" and its contents. or if i specify 'recipe -name chocolate cake' it will select the recipe name "chocolate cake " and its contents. To select the proper contents, In the command line, i will specify 'recipe -name Chocolate Chip Bars -use "vanilla"',then it should select 'step name=combine with brown sugar and vanilla in large mixing bowl' and if i specify 'recipe -name Chocolate Chip Bars -use "vanilla,nuts",then it should select 'step name=combine with brown sugar and vanilla,nuts in large mixing bowl '.this is where i am facing the problem.what is the best way to select the "dependency" depending on the specification from the command line. my code:
my $parser = XML::LibXML->new(); my $recipeDoc = $parser->parse_file("recipe.xml"); foreach my $recipe ($recipeDoc->getElementsByTagName("recipe")) { $recipeName = $recipe->getAttribute("name"); foreach my $step ($recipe->getElementsByTagName("step")) { foreach my $dep ($step->getElementsByTagName("dependency")) { if($dep->hasAttribute("use") { # this is where i am struck. # here i have to get value from the command line ( and i know ho +w to get the information from command line) and match it with the dep +endency value. (the issue is with matching it with the dependency val +ue after getting the values from command line) } $stepName = $step->getAttribute("name"); } }

Replies are listed 'Best First'.
Re: parsing xml file using LIBXML
by mirod (Canon) on May 12, 2005 at 08:18 UTC

    First your file is not XML. Attributes should be quoted, so <step name= Preheat oven to 350 degrees/> will cause the parser to die.

    Even if you fix this, I think your schema (or DTD or whatever) is not adapted. You use attributes where you should be using elements: name should be an element, possibly with an id attribute that would allow you to refer to it. Then you pack things in attributes that you will have to parse further: <dependency use="vanilla,nuts,!mango"/> should probably be written along the lines of: <dependency><use>vanilla</use><use>nuts</use><use type="optional">mango</use></dependency>.

    Once you do this, it will be much easier for you to query the document using XPath (read on XPath, it should allow you to get directly the information you want, instead of using the 3 nested loops you have in your code). For multiple use options, I would use something like Getopt::Long and allow multiple use options, from which you can then build the proper XPath query.

    It might even make sense to separate further and have the use element be an empty element with an idref to an element that will have the name of the ingredient:

    <dependency><use idref="vanilla"/> <use idref="nut"/><use type="optional" idref="mango"/> </dependency>
    further in the file (or in an other)
    <ingredients> <ingredient id="vanilla"><name>vanilla</name><description>...</descr +iption></ingredient> <ingredient id="nut"><name>nuts</name><description>...</description> +</ingredient> <ingredient id="mango"><name>mango</name><description>...</descripti +on></ingredient> </ingredients>

      Disclaimer: I like XML::Twig. A lot. But I really don't know anything about proper XML design. I just use XML that is defined by other programs and, thus, by other people.

      "Adapted." That's a new term to me. Do you have any URLs or anything (even if it's a link on xmltwig.org) on proper XML document design? I've never seen anything that describes when something should be an attribute vs a new subelement. I'd love to see something like that to learn from - so far, I just go by gut feel, and am probably wrong 50-70% of the time compared to "standard usage" :-)

        Sorry for the "adapted", it's from the French. Replace by "designed" and I hope it will make more sense.

        As for the old "attribute vs element" question. It has been the subject of countless threads, discussions, arguments... Instead of adding mine, I'll refer you to SGML/XML: Using Elements and Attributes, the commentaries are very interesting.

Re: parsing xml file using LIBXML
by dakkar (Hermit) on May 12, 2005 at 12:57 UTC

    Various points to make...

    • as mirod says, that's an ugly piece of XML. But that's not the problem we're here to solve today...
    • getElementsByTagName works only on a Document object, not on any element: it finds all the elements with the given name anywhere in the document. I don't think that's what you need.
    • to do the match, you could use getAttribute() to get the value of the use attribute, and then use a regexp or something similar to test if it contains the value passed from the command line. Like:
      $attrValue =~ /\Q$param\E/
      where the \Q...\E is to avoid strange characters from being interpreted by the regexp engine.
    • learn XPath. Really do. That whole program can be reduced to a single XPath query. And it would be easier to read.
    -- 
            dakkar - Mobilis in mobile
    

    Most of my code is tested...

    Perl is strongly typed, it just has very few types (Dan)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://456261]
Approved by Zaxo
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (7)
As of 2024-03-28 09:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found