Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"

Re: Hashing urls with Adler32

by Tomte (Priest)
on May 31, 2007 at 14:40 UTC ( #618499=note: print w/replies, xml ) Need Help??

in reply to Hashing urls with Adler32

It produces an output of fixed length for an input of arbitrary length. So there is an infinit set of possible inputs mapped to a finit set of checksums - so the algorithm can't produce a unique checksum for every url you feed it. I suggest you read the article on wikipedia and then have a look at SHA1 as possibly the better solution - that nonetheless will not be bijective either (It will produce collisions!) - it depends on your problem at hand if this is a hindrance.


An intellectual is someone whose mind watches itself.
-- Albert Camus

Replies are listed 'Best First'.
Re^2: Hashing urls with Adler32
by isync (Hermit) on May 31, 2007 at 15:16 UTC
    Currently I am using MD5 as digest, but with lots of urls the data structure is growing big.

    So I thought about reducing the bits per url and using adler32 instead.

    BTW: I am implementing a url-seen structure here and need the hash to check against, while minimizing false positives/negatives.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://618499]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (2)
As of 2021-10-18 11:22 GMT
Find Nodes?
    Voting Booth?
    My first memorable Perl project was:

    Results (73 votes). Check out past polls.