Re: Duplicate detection (SQL)


There's more than one way to do things
	PerlMonks

Re: Duplicate detection (SQL)

by zakzebrowski (Curate)

on Oct 20, 2003 at 12:15 UTC ( [id://300541]=note: print w/replies, xml )

Need Help??

in reply to Duplicate (similarity) detection (SQL)

Also see the various Digest:: modules on cpan. (Not a pure sql solution.) Given a (string | binary data | undef) returns a unique* string which is one way unique to that string. (One way meaning that you cannot determine what the content was from the digest...) So just check to see if the digests are the same for various messages and you're done...
* assuming the digest method works. Some are better than others (Md5 versus md4), and others are open source based versus propiatary algorithims (md5 versus sha)...

----
Zak

undef$/;$mmm="J\nutsu\nutss\nuts\nutst\nuts A\nutsn\nutso\nutst\nutsh\
+nutse\nutsr\nuts P\nutse\nutsr\nutsl\nuts H\nutsa\nutsc\nutsk\nutse\n
+utsr\nuts";open($DOH,"<",\$mmm);$_=$forbbiden=<$DOH>;s/\nuts//g;print
+;
[download]