Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re: Idea for XPath implementation

by BrowserUk (Patriarch)
on Apr 02, 2003 at 10:30 UTC ( [id://247460]=note: print w/replies, xml ) Need Help??


in reply to Idea for XPath implementation

From your later replies I can now see the problem that you are trying to solve, which I admit I didn't really appreciate at first.

I tried to see how to apply your structure to an XML file that contained repetitous elements

Eg.

<users> <user> <id>fred</id> <pass>crude</pass> </user> <user> <id>barney</id> <pass>polite</pass> </user> </users>

And I think I could see it working for a simple repetitious grouping as shown here, once you start getting groups within groups, you end up with a huge, tautological mess that isn't even easy to see how to construct or search using nested recursion. It certainly won't be quick or easy.

The only reasonably simply solution that I can think of is to build an single-level array, with each unique path in the data represented using a string representation of its path, and a second array of arrays that is a fully inverted index of the first.

my @paths = ( "/colours/paprika/1/good/red", "/colours/paprika/1/bad/green", "/colours/paprika/2/good/yellow", "/colours/paprika/2/bad/brown", "/colours/banana/yellow" ); my %invert = ( colours=>[0,1,2,3,4], paprika=>[0,1,2,3], banana=>[4], good=>[0,2], bad=>[1,3], red=>[0], green=>[1], yellow=>[2,4], brown=>[3], );

It's fairly easy to see how for any given query path, you can split it into its sub elements, look up each part in the array of arrays, do a little set arithmetic to locate those paths in the simple array that contains all the requisite matches, and then use the results to find the associated data.

Building the inverted index can be achieved on-the-fly as the paths array is being constructed, or en-mass afterwards. The only slightly tricky part is remembering the next available index for groups of data, especially when these can be nested, but the final representation should be considerably less memory hungry than the equivalent hash and array tree, and the lookup of partial match query paths quite a lot faster too.


Examine what is said, not who speaks.
1) When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.
2) The only way of discovering the limits of the possible is to venture a little way past them into the impossible
3) Any sufficiently advanced technology is indistinguishable from magic.
Arthur C. Clarke.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://247460]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others learning in the Monastery: (4)
As of 2024-03-29 00:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found