Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Re^2: Long url to tiny url.

by almut (Canon)
on Jul 07, 2007 at 10:03 UTC ( [id://625404]=note: print w/replies, xml ) Need Help??


in reply to Re: Long url to tiny url.
in thread Long url to tiny url.

If not, you use some random method to generate tinyURLs until you get one that's not already in your database ...

Personally (i.e. I don't know if they are doing that way), I would just count issued tinyURLs (the count might come directly from the database), and represent that number in some number base, e.g. 62 for the printable chars you've specified.

The advantage would be that it's simpler in the following ways: (1) I wouldn't have to check whether some randomly generated number is already in use, and (2) would be able to have the number of required digits grow just as needed, so the tinyURL is always as compact as possible. (With a random number, I'd have to choose ahead what width to use. Also, as full utilisation of the set is being approached, searching for a remaining free number would require more and more iterations...).

Replies are listed 'Best First'.
Re^3: Long url to tiny url.
by Polonius (Friar) on Jul 07, 2007 at 14:45 UTC

    If you just issue the next available alphanumeric sequence, as you say, the URL is always as small as possible. But the problem with that, or any other predictable sequence, is that it's open to abuse. Conversely, if you use a random number generator with a short period, as you say you'll get lots more collisions, while a long period means the URLs will be longer than necessary.

    Polonius

      Good point! (It always surprises me how you monks can turn anything into an interesting discussion :)

      Another way to deal with the problem of predictability would be to apply some 'salting' mechanism. For example, in its most simple case, prepend (or append) a random digit to the number, and use that to XOR the other digits (many other algorithms are conceivable, of course). That would considerably reduce predictability, while retaining most of the desirable properties of the simple counting approach.

      Yet another way would be to just check every tinyURL against a list of potentially vulgar or otherwise abusive words, and, if found, just skip that particular tinyURL. The advantage of that would also be that innocent users of the service wouldn't accidentally be assigned URLs, which might require them to add apologies or further explanations when sending out such an URL...

Re^3: Long url to tiny url.
by RMGir (Prior) on Jul 07, 2007 at 10:49 UTC
    Very nice. That would work quite well, yes.

    A better variant might be to seed some random number generator with a very long period, then generate the output strings -- that way, the consecutive tinyurl's wouldn't be predictable, and it would still be just as cheap to pre-generate them.

    If the generator can take its output as its seed, then all you need to do is save the last output value so you can restart the generator when it's time to generate the next set...


    Mike

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://625404]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (4)
As of 2024-04-25 18:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found