Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight

POE and recursion

by OverlordQ (Hermit)
on Sep 02, 2007 at 22:18 UTC ( [id://636643] : perlquestion . print w/replies, xml ) Need Help??

OverlordQ has asked for the wisdom of the Perl Monks concerning the following question:

Greetings monks.

Currently I'm writing a POE script that parses certains sets of input upon request. Input is, thankfully, easy to parse XML. However the problem I'm running into is is that some of the data in the XML file contains what we could call 'See Also' records, in which they point to additional files which need to be fetched and parsed, which also may contain 'See Also' records, and so on and so forth.

Now with a non-POE script I'd just keep a count of how many 'See Also' doc's parsed, and keep fetching/parsing until it reaches 0, but with POE's event driven nature I'm confused as how store the temporary data and how to know when I've reached the 'bottom' so I know it's safe to start doing the actual processing of the XML.

Any ideas and suggestions would be greatly appreciated.

See this node for an accurate description of what I'm trying to accomplish

Replies are listed 'Best First'.
Re: POE and recursion
by rcaputo (Chaplain) on Sep 02, 2007 at 22:47 UTC

    You need to resolve some details before I can provide a useful answer. How are you currently handling the XML in POE? Must the referenced XML files be processed before the referencing file continues?

      What I'm doing is parsing Wikipedia Categories. Using Physics as the example Input, the script would request this page.
      • The page elements with an ns attribute of zero are actual articles, and will need to be saved for processing later.
      • The page elements with an ns attribute of 14 are sub-categories which need traversed as well
      EG: the first one on the page is Category:Applied and interdisciplinary physics, so the script will then request this page.

      Then as above, all the page elements with ns attribute of zero are pages and saved for later, and the elements with ns attribute of 14 are added to the list of sub-categories to walk.

      Once all the the subcategories are traversed, then operations are performed on the list of unique articles.

      Basically what is happening is given the input, the script should walk all the branches of the tree 'below' it.

        This kind of recursion can be rewritten as iteration. Here's some untested code: