http://qs321.pair.com?node_id=906322


in reply to Re^2: Using hashes for set operations...
in thread Using hashes for set operations...

In Perl stringification of references has no performance penalty. It's just a string with the reference addressą and the type (this includes package name if blessed).
perl -e '$p=[];print $p'
ARRAY(0x928c880)
(I think you are confusing with other language like JS˛, where the whole data is dumped)
No, I'm thinking that it is pointless to compare references since any two copies will test as unequal. Instead, you must manually write something that stringifies (or hashes, in the other sense of the word) to a canonical form in order to then test for equivalence.

I guess that depends on what the user intends, so the FAQ should point out that using a reference (or object) as a hash key will stringify as you show, so do the same test as == against the reference itself (the address).

I think intersection and friends should be like sort, in that they can take a piece of code that is used to determine what is meant by equivalence in this particular case. That's easy to call but can be inefficient; and just like you use the whatever maneuver with sort to cache the keys, you could do the same with intersection. But the eventual module can have that built-in, as your ideas directly incorporate that kind of keying. Then the user needs to provide code to produce a canonical key of one item, as opposed to comparing the equivalence of two parameters.

But back to the underlying code: If I want two ad-hoc uses of [qw/1 2 3/] to be considered the same, stringifying the reference won't do it. It needs to call a function to generate the string key from the contents. And we suppose that this is expensive, so only call it once per value in each input list.

The user wants to find the intersection of two lists, so he would be told to pass @set1 and @set2, and optionally a &func, which defaults to built-in stringification. Prepare your internal %set1 from @set1 and func(each element), and arrange the code (at least in the case where a func is passed -- it could have different implementations) to not need to call func again on some value but to always keep it with the key.