How to make a fingerprint from an Object

jeanluca has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: How to make a fingerprint from an Object by nothingmuch (Priest) on May 07, 2007 at 14:01 UTC
If the objects are "simple" then concatenate all the fields and run that through Digest. Then cache the result. Be sure to concatenate hash values sorted by key if you intend to use the digests between instances of perl, since the hashing order changes per invocation. If they have deep nested structures look at Data::Structure::Util, or investinage Object::Signature which uses Storable under the hood, and returns a digest of the serialized data. Likewise, cache the result in an additional attribute of the object. -nuffin zz zZ Z Z #!perl	[reply]
Re: How to make a fingerprint from an Object by Moron (Curate) on May 07, 2007 at 17:44 UTC
You could use Digest::MD5 but that only operates on one string. In general, an object is a blessed reference to a compound data structure which doesn't automatically fit. So the real challenge is to figure out a unique way to convert the data structure to a single string that can be converted to MD5. Data::Dumper ~~won't~~ can guarantee a unique key ordering (see reply below from tinita). ~~, so you'd need to sort the structure by hash key before delimiting, although arrays should be used in existing order~~ ( $; = ASCI(19), non-printing, is also a useful delimiter for printable data ). The only thing that occurs to me though is that fingerprinting the whole object shouldn't be necessary - it should be sufficient to fingerprint an ordered, delimited concatenation of selected instance fields. It is more usual (though not necessarily mandatory) for this to be the primary key in the logical data model rather than some bulk data field. Update: You could also consider storing the MD5 in the database, setting a UNIQUE constraint on some selection of fields and letting the database deal with the problem. __________________________________________________________________________________ ^M Free your mind!	[reply]
Re^2: How to make a fingerprint from an Object by tinita (Parson) on May 07, 2007 at 20:42 UTC
Data::Dumper won't guarantee a unique key ordering oh, it does =) just set `$Data::Dumper::Sortkeys` to 1	[reply] [d/l]
Re: How to make a fingerprint from an Object by doom (Deacon) on May 08, 2007 at 00:03 UTC
You don't make it entirely clear what you mean by a "duplicate" object. If you're interested in finding objects with identical data, then yes, an MD5 fingerprint (updated whenever the data was last changed) would be a decent solution. If you're looking for copies of the same object, then you can just check `scalar $object`.	[reply]
Re^2: How to make a fingerprint from an Object by Dervish (Friar) on May 08, 2007 at 03:35 UTC
Of course, you might want to do a bit more data analysis before you just jump into using a hash, since the process of creating the hash will, by definition, take longer than it would to simply examine the two structures once. So whether or not the hash will speed up your code depends on what you want to do with it. That said, MD5 is a good hashing algorithm. The number of false collisions is small, the algorithm itself is fairly simple (and fast) and if your structure is significantly bigger than the hash key size, the process of comparing many hash keys might be a lot faster than that of comparing many structures. We use it where I work to shortcut the parsing of some multi-megabyte structures, if the user asks to include them many times.	[reply]
Re: How to make a fingerprint from an Object by rblasch (Monk) on May 08, 2007 at 16:48 UTC
Have a look at the source of File::Find::Duplicates. Maybe you can adapt the way it finds duplicate files to your needs.	[reply]
Re^2: How to make a fingerprint from an Object by jeanluca (Deacon) on May 10, 2007 at 08:50 UTC
It seems using Digest::MD5 is the way to go, however I see a big difference between taking a MD5 from an Object or a file. For example, you might not want to use all the data from the objects for the fingerprint! LuCa	[reply]
Re: How to make a fingerprint from an Object by valdez (Monsignor) on May 22, 2007 at 12:52 UTC
I would use Data::UUID to generate a universal unique identifier during object creation; you would store that object identifier and use it later for comparison; in fact there is no reason to compute object fingerprint later. `package Class; use strict; use warnings; use Data::UUID; sub new { my $class = shift; my $uuid = Data::UUID->new; return bless { some => 'data', object_signature => $uuid->create_str(), }, $class; } sub object_signature { shift->{object_signature}; } 1;` [download] HTH, Valerio	[reply] [d/l]


Problems? Is your data what you think it is?
	PerlMonks