How do sites like goo.gl or jsfiddle generate their URL codes?

前端 未结 3 1436
离开以前
离开以前 2021-01-30 09:15

I would like to generate a code like goo.gl and jsfiddle websites (http://jsfiddle.net/XzKvP/).

I tried different things that give me too large of a guid,

相关标签:
3条回答
  • 2021-01-30 09:48

    You can think of the five-letter code as a number in base-62 notation: your "digits" are 26 lowercase and 26 uppercase letters, and digits from 0 to 9. (26+26+10) digits in total. Given a number from 0 to 62^5 (which equals 916132832) (say, your primary key) you can do the conversion to a five-digit base-62 as follows:

    private static char Base62Digit(int d) {
        if (d < 26) {
            return (char)('a'+d);
        } else if (d < 52) {
            return (char)('A'+d-26);
        } else if (d < 62) {
            return (char)('0'+d-52);
        } else {
            throw new ArgumentException("d");
        }
    }
    
    static string ToBase62(int n) {
        var res = "";
        while (n != 0) {
            res = Base62Digit(n%62) + res;
            n /= 62;
        }
        return res;
    }
    
    private static int Base62Decode(char c) {
        if (c >= '0' && c <= '9') {
            return 52 + c - '0';
        } else if (c >= 'A' && c <= 'Z') {
            return 26 + c - 'A';
        } else if (c >= 'a' && c <= 'z') {
            return c - 'a';
        } else {
            throw new ArgumentException("c");
        }
    }
    
    static int FromBase62(string s) {
        return s.Aggregate(0, (current, c) => current*62 + Base62Decode(c));
    }
    

    Here is how to generate cryptographically strong random numbers (you need to add a reference to System.Security):

    private static readonly RNGCryptoServiceProvider crypto =
        new RNGCryptoServiceProvider();
    
    private static int NextRandom() {
        var buf = new byte[4];
        crypto.GetBytes(buf);
        return buf.Aggregate(0, (p, v) => (p << 8) + v) & 0x3FFFFFFF;
    }
    
    0 讨论(0)
  • 2021-01-30 09:52

    The solutions based on a random substring are no good because the outputs will collide. It may happen prematurely (with bad luck), and it will eventually happen when the list of generated values grows large. It doesn't even have to be that large for the probability of collisions to become high (see birthday attack).

    What's good for this problem is a pseudo random permutation between the incrementing ID and its counterpart that will be shown in the URL. This technique guarantees that a collision is impossible, while still generating into an output space that is as small as the input space.

    Implementation

    I suggest this C# version of a Feistel cipher with 32 bits blocks, 3 rounds and a round function that is inspired by pseudo-random generators.

    private static double RoundFunction(uint input)
    {
        // Must be a function in the mathematical sense (x=y implies f(x)=f(y))
        // but it doesn't have to be reversible.
        // Must return a value between 0 and 1
        return ((1369 * input + 150889) % 714025) / 714025.0;
    }
    
    private static uint PermuteId(uint id)
    {
        uint l1=(id>>16)&65535;
        uint r1=id&65535;
        uint l2, r2;
        for (int i = 0; i < 3; i++)
        {
            l2 = r1;
            r2 = l1 ^ (uint)(RoundFunction(r1) * 65535);
            l1 = l2;
            r1 = r2;
        }
        return ((r1 << 16) + l1);
    }
    

    To express the permuted ID in a base62 string:

    private static string GenerateCode(uint id)
    {
        return ToBase62(PermuteId(id));
    }
    

    The Base62 function is the same as the previous answer except that is takes uint instead of int (otherwise these functions would have to be rewritten to deal with negative values).

    Customizing the algorithm

    RoundFunction is the secret sauce of the algorithm. You may change it to a non-public version, possibly including a secret key. The Feistel network has two very nice properties:

    • even if the supplied RoundFunction is not reversible, the algorithm guarantees that PermuteId() will be a permutation in the mathematical sense (wich implies zero collision).

    • changing the expression inside the round function even lightly will change drastically the list of final output values.

    Beware that putting something too trivial in the round expression would ruin the pseudo-random effect, although it would still work in terms of uniqueness of each PermuteId output. Also, an expression that wouldn't be a function in the mathematical sense would be incompatible with the algorithm, so for instance anything involving random() is not allowed.

    Reversability

    In its current form, the PermuteId function is its own inverse, which means that:

    PermuteId(PermuteId(id))==id
    

    So given a short string produced by the program, if you convert it back to uint with a FromBase62 function, and give that as input to PermuteId(), that will return the corresponding initial ID. That's pretty cool if you don't have a database to store the [internal-ID / shortstring] relationships: they don't actually need to be stored!

    Producing even shorter strings

    The range of the above function is 32 bits, that is about 4 billion values from 0 to 2^32-1. To express that range in base62, 6 characters are needed.

    With only 5 characters, we could hope to represent at most 62^5 values, which is a bit under 1 billion. Should the output string be limited to 5 characters, the code should be tweaked as follows:

    • find N such that N is even and 2^N is as high as possible but lower than 62^5. That's 28, so our real output range that fits in 62^5 is going to be 2^28 or about 268 million values.

    • in PermuteId, use 28/2=14 bits values for l1 and r1 instead of 16 bits, while being careful to not ignore a single bit of the input (which must be less than 2^28).

    • multiply the result of RoundFunction by 16383 instead of 65535, to stay within the 14 bits range.

    • at the end of PermuteId, recombine r1 and l1 to form a 14+14=28 bits value instead of 32.

    The same method could be applied for 4 characters, with an output range of 2^22, or about 4 million values.

    What does it look like

    In the version above, the first 10 produced strings starting with id=1 are:

    cZ6ahF
    3t5mM
    xGNPN
    dxwUdS
    ej9SyV
    cmbVG3
    cOlRkc
    bfCPOX
    JDr8Q
    eg7iuA
    

    If I make a trivial change in the round function, that becomes:

    ey0LlY
    ddy0ak
    dDw3wm
    bVuNbg
    bKGX22
    c0s5GZ
    dfNMSp
    ZySqE
    cxKH4b
    dNqMDA
    
    0 讨论(0)
  • 2021-01-30 10:01

    This is what I ended up doing

    (Updated since Daniel Vérité's answer):

    class Program
    {
    
        private static double RoundFunction(uint input)
        {
            // Must be a function in the mathematical sense (x=y implies f(x)=f(y))
            // but it doesn't have to be reversible.
            // Must return a value between 0 and 1
            return ((1369 * input + 150889) % 714025) / 714025.0;
        }
        private static char Base62Digit(uint d)
        {
            if (d < 26)
            {
                return (char)('a' + d);
            }
            else if (d < 52)
            {
                return (char)('A' + d - 26);
            }
            else if (d < 62)
            {
                return (char)('0' + d - 52);
            }
            else
            {
                throw new ArgumentException("d");
            }
        }
        private static string ToBase62(uint n)
        {
            var res = "";
            while (n != 0)
            {
                res = Base62Digit(n % 62) + res;
                n /= 62;
            }
            return res;
        }
        private static uint PermuteId(uint id)
        {
            uint l1 = (id >> 16) & 65535;
            uint r1 = id & 65535;
            uint l2, r2;
            for (int i = 0; i < 3; i++)
            {
                l2 = r1;
                r2 = l1 ^ (uint)(RoundFunction(r1) * 65535);
                l1 = l2;
                r1 = r2;
            }
            return ((r1 << 16) + l1);
        }
    
    
        private static string GenerateCode(uint id)
        {
            return ToBase62(PermuteId(id));
        }
    
        static void Main(string[] args)
        {
    
            Console.WriteLine("testing...");
    
                try
                {
    
                    for (uint x = 1; x < 1000000; x += 1)
                    {
                        Console.Write(GenerateCode(x) + ",");
    
                    }
    
                }
                catch (Exception err)
                {
                    Console.WriteLine("error: " + err.Message);
                }
    
            Console.WriteLine("");
            Console.WriteLine("Press 'Enter' to continue...");
            Console.Read();
        }
    }
    
    0 讨论(0)
提交回复
热议问题