Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Parse XML file

by blackdragoen (Novice)
on Sep 20, 2007 at 10:45 UTC ( [id://640084]=perlquestion: print w/replies, xml ) Need Help??

blackdragoen has asked for the wisdom of the Perl Monks concerning the following question:

This node falls below the community's threshold of quality. You may see it by logging in.

Replies are listed 'Best First'.
Re: Parse XML file
by erroneousBollock (Curate) on Sep 20, 2007 at 14:31 UTC
    Blackdragoen, you've already been given the advice (in the CB) that you should use XPATH to find such nodes. You've been told that your lack of understanding of XML is the problem. To reiterate, there's no such thing as an 'end tag' in XML. The nodes you're interested in are nodes without children nor attributes.

    You should parse the document with XML::Parser and pass the result to XML::XPath with the appropriate query expression. I've already given you a hint for the XPATH expression; *[not(*)] will give you all the nodes without children.

    Have you applied this suggestion yet, or researched XPATH as you were advised?

    -David

      This worked for me:

      print $_->getName."\n" for $xp->findnodes('//*[not(child::*) and not(attribute::*) and not(st +ring(.))]');

      -David

        For HTML, many folks do use that monicker for markup of the form </name>. In the HTML DOM (and the XML DOM for that matter) no construct maps to that markup, it's merely an artifact of DOM serialisation; an equivalent serialisation might employ whitespace (such as indentation) to demarkation of nested nodes. There is no way to "search" for such an entity in the DOM.

        I personally don't mind it when people call markup of that form an 'end tag' in HTML because it's possible to construct invalid documents that will usually render correctly in most HTML browsers.

        It makes no sense to refer to "tags" for XML as it's not possible to make use of an invalid XML document. The DOM is constructed from (possibly) nested Nodes (of various types) and string-like values attached to those Nodes.

        And finally, the OP was refering to the XML nodes of the form <name />, which is not at all the same as HTML of the form </name>  . The former represents a Node in the DOM without child-nodes, nor attributes, nor value; the later does not represent any node, attribute or value in the DOM.

        -David [erroneousBollock isn't logged in]

Re: Parse XML file
by shoness (Friar) on Sep 20, 2007 at 13:31 UTC
    I'm not sure I understand completely what you want to do, but your answer lies with XML::Parser (or XML::Twig).

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://640084]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (4)
As of 2024-03-29 06:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found