http://qs321.pair.com?node_id=661275


in reply to Efficient partial deep cloning

Can you "flatten" each of the data structures? By that, I mean store everything in two ways simultaneously: in the actual structure, and also in a separate table that maps string representations of the paths to the appropriate value or variable in the structure. Or perhaps a reference to the actual value. Or separate the flattened representation into data and variables -- then, when you bind($s1,$s2), you'd scan through the keys to the table of variables, and look up each one in the flattened representation of the other value. It would be a simple string lookup in a hashtable at that point.

When to generate or update the flattened representation would depend on how these structures are populated. If you were to generate all of them at once, then yes, recursively walk the data structure. I don't think localizing the stack helps, though -- I think that would require copying the whole stack every time you localized it with the old value plus one new path token. Better to just push the token on and pop it back off after you're done. Actually, if you're doing it my way (with a string representation of the path), then you would just call bind(..., $path . $token) recursively and it will manage your stack for you.

And if you're maintaining the flattened version at all times, you just need to have the path so far available anywhere you're mutating things.

Why are you worried about this particular optimization already? It sounds like you aren't done with the rest of the algorithm. You might want to make it work first, just in case this optimization turns out to be invalid. Can variables be non-leaf nodes in your structures? If so, then you may have to do full searches rather than simple cached path lookups. Can the same variable appear more than once? If so, then you have to do speculative bindings or loop through all possibilities or something. Can the right-hand side also contain variables? Then you need to be able to bind the two variables together, and if they can appear more than once, I think the speculation gets hairier.

By the way, I think your function bind() should be called unify(). The terminology I am familiar with is that you unify the two structures, resulting in a set of bindings (possibly recorded via calls to bind()). But perhaps you're doing things differently from what I remember; I can't figure out when you would ever want to unify a single structure with multiple other structures in succession. Then again, it's been a very long time since I last thought about this stuff.