Creating your own Tinyurl style uid

前端 未结 8 1699
抹茶落季
抹茶落季 2020-12-02 11:30

I\'m writing a small article on humanly readable alternatives to Guids/UIDs, for example those used on TinyURL for the url hashes (which are often printed in magazines, so n

相关标签:
8条回答
  • 2020-12-02 12:02

    The probability of a collision against one specific ID is:

    p = ( 0.5 * ( (0.5*1/10) + (0.5*1/26) ) )^6
    

    which is around 1.7×10^-9.

    The probability of a collision after generating n IDs is 1-p^n, so you'll have roughly a 0.17% chance of a collision for each new insertion after 1 million IDs have been inserted, around 1.7% after 10 million IDs, and around 16% after 100 million.

    1000 IDs/minute works out to about 43 million/month, so as Sklivvz pointed out, using some incrementing ID is probably going to be a better way to go in this case.

    EDIT:

    To explain the math, he's essentially flipping a coin and then picking a number or letter 6 times. There's a 0.5 probability that the coin flip matches, and then 50% of the time there's a 1/10 chance of matching and a 50% chance of a 1/26 chance of matching. That happens 6 times independently, so you multiply those probabilities together.

    0 讨论(0)
  • 2020-12-02 12:10

    Some time ago I did exactly this, and I followed the way Sklivvz mentioned. The whole logic was developed with a SQL server stored procedure and a couple of UDF (user defined functions). The steps were:

    • say that you want to shorten this url: Creating your own Tinyurl style uid
    • Insert the URL in a table
    • Obtain the @@identity value of the last insert (a numeric id)
    • Transform the id in a corresponding alphanumeric value, based on a "domain" of letters and numbers (I actually used this set: "0123456789abcdefghijklmnopqrstuvwxyz")
    • Return that value back, something like 'cc0'

    The conversion was realized thru a couple of very short UDF.

    Two conversion called one after the other would return "sequential" values like these:

    select dbo.FX_CONV (123456) -- returns "1f5n"
    
    select dbo.FX_CONV (123457) -- returns "1f5o"
    

    If you are interested I can share the UDF's code.

    0 讨论(0)
  • from wikipedia:

    When printing fewer characters is desired, GUIDs are sometimes encoded into a base64 or Ascii85 string. Base64-encoded GUID consists of 22 to 24 characters (depending on padding), for instance:

    7QDBkvCA1+B9K/U0vrQx1A
    7QDBkvCA1+B9K/U0vrQx1A==
    

    and Ascii85 encoding gives only 20 characters, e. g.:

    5:$Hj:Pf\4RLB9%kU\Lj 
    

    So if you are concerned with uniqueness, a base64 encoded GUID gets you somewhat closer to what you want, though its not 6 characters.

    Its best to work in bytes first, then translate those bytes into hexadecimal for display, rather than working with characters directly.

    0 讨论(0)
  • 2020-12-02 12:11

    If you're using 6 characters, a-z and 0-9, thats a total of 36 characters. The number of permutations is thus 36^6 which is 2176782336.. so it should only clash 1/2176782336 times.

    0 讨论(0)
  • 2020-12-02 12:14

    Look up the Birthday Paradox, it's the exact problem that you're running into.

    The question is: How many people do you need to get together in a room, so that you have a 50% chance of any two people having the same birthdate? The answer may surprise you.

    0 讨论(0)
  • 2020-12-02 12:14

    Why not just use a hashing algorithm? and use a hash of the url?

    if you are using random numbers chances are you will get clashes because they are indeterminate.

    hashes arent proovably unique but there is a fairly good chance that the hash of a string will be unique.

    Correction

    Actually wait you want them to be humanly readable... if you put them in hex they are technically humanly readable.

    or you could use an algorithm that converted a hash into a humanly readable string. if the humanly readable string is a different representation of the hash it should also be as "unique" as the hash, ie base 36 of the original hash.

    0 讨论(0)
提交回复
热议问题