Guid.NewGuid() VS a random string generator from Random.Next()

后端 未结 7 1652
攒了一身酷
攒了一身酷 2021-02-01 17:51

My colleague and I are debating which of these methods to use for auto generating user ID\'s and post ID\'s for identification in the database:

One option uses a single

相关标签:
7条回答
  • 2021-02-01 17:54

    "Auto generating user ids and post ids for identification in the database"...why not use a database sequence or identity to generate keys?

    To me your question is really, "What is the best way to generate a primary key in my database?" If that is the case, you should use the conventional tool of the database which will either be a sequence or identity. These have benefits over generated strings.

    1. Sequences/identity index better. There are numerous articles and blog posts that explain why GUIDs and so forth make poor indexes.
    2. They are guaranteed to be unique within the table
    3. They can be safely generated by concurrent inserts without collision
    4. They are simple to implement

    I guess my next question is, what reasons are you considering GUID's or generated strings? Will you be integrating across distributed databases? If not, you should ask yourself if you are solving a problem that doesn't exist.

    0 讨论(0)
  • 2021-02-01 17:54

    Use System.Guid as it:

    ...can be used across all computers and networks wherever a unique identifier is required.

    Note that Random is a pseudo-random number generator. It is not truly random, nor unique. It has only 32-bits of value to work with, compared to the 128-bit GUID.

    However, even GUIDs can have collisions (although the chances are really slim), so you should use the database's own features to give you a unique identifier (e.g. the autoincrement ID column). Also, you cannot easily turn a GUID into a 4 or 20 (alpha)numeric number.

    0 讨论(0)
  • 2021-02-01 17:54

    Regarding your edit, here is one reason to prefer a GUID over a generated string:

    The native storage for a GUID (uniqueidentifier) in SQL Server is 16 bytes. To store a equivalent-length varchar (string), where each "digit" in the id is stored as a character, would require somewhere between 32 and 38 bytes, depending on formatting.

    Because of its storage, SQL Server is also able to index a uniqueidentifier column more efficiently than a varchar column as well.

    0 讨论(0)
  • 2021-02-01 18:13

    As written in other answers, my implementation had a few severe problems:

    • Thread safety: Random is not thread safe.
    • Predictability: the method couldn't be used for security critical identifiers like session tokens due to the nature of the Random class.
    • Collisions: Even though the method created 20 'random' numbers, the probability of a collision is not (number of possible chars)^20 due to the seed value only being 31 bits, and coming from a bad source. Given the same seed, any length of sequence will be the same.

    Guid.NewGuid() would be fine, except we don't want to use ugly GUIDs in urls and .NETs NewGuid() algorithm is not known to be cryptographically secure for use in session tokens - it might give predictable results if a little information is known.

    Here is the code we're using now, it is secure, flexible and as far as I know it's very unlikely to create collisions if given enough length and character choice:

    class RandomStringGenerator
    {
        RNGCryptoServiceProvider rand = new RNGCryptoServiceProvider();
        public string GetRandomString(int length, params char[] chars)
        {
            string s = "";
            for (int i = 0; i < length; i++)
            {
                byte[] intBytes = new byte[4];
                rand.GetBytes(intBytes);
                uint randomInt = BitConverter.ToUInt32(intBytes, 0);
                s += chars[randomInt % chars.Length];
            }
            return s;
        }
    }
    
    0 讨论(0)
  • 2021-02-01 18:14

    I am looking for a more in depth reason as to why the cooked up method may be more likely to generate collisions given the same degrees of freedom as a Guid.

    First, as others have noted, Random is not thread-safe; using it from multiple threads can cause it to corrupt its internal data structures so that it always produces the same sequence.

    Second, Random is seeded based on the current time. Two instances of Random created within the same millisecond (recall that a millisecond is several million processor cycles on modern hardware) will have the same seed, and therefore will produce the same sequence.

    Third, I lied. Random is not seeded based on the current time; it is seeded based on the amount of time the machine has been active. The seed is a 32 bit number, and since the granularity is in milliseconds, that's only a few weeks until it wraps around. But that's not the problem; the problem is: the time period in which you create that instance of Random is highly likely to be within a few minutes of the machine booting up. Every time you power-cycle a machine, or bring a new machine online in a cluster, there is a small window in which instances of Random are created, and the more that happens, the greater the odds are that you'll get a seed that you had before.

    (UPDATE: Newer versions of the .NET framework have mitigated some of these problems; in those versions you no longer have every Random created within the same millisecond have the same seed. However there are still many problems with Random; always remember that it is only pseudo-random, not crypto-strength random. Random is actually very predictable, so if you are relying on unpredictability, it is not suitable.)

    As other have said: if you want a primary key for your database then have the database generate you a primary key; let the database do its job. If you want a globally unique identifier then use a guid; that's what they're for.

    And finally, if you are interested in learning more about the uses and abuses of guids then you might want to read my "guid guide" series; part one is here:

    http://blogs.msdn.com/b/ericlippert/archive/2012/04/24/guid-guide-part-one.aspx

    0 讨论(0)
  • 2021-02-01 18:16

    Contrary to what some people have said in the comment, a GUID generated by Guid.NewGuid() is NOT dependent on any machine-specific identifier (only type 1 GUIDs are, Guid.NewGuid() returns a type 4 GUID, which is mostly random).

    As long as you don't need cryptographic security, the Random class should be good enough, but if you want to be extra safe, use System.Security.Cryptography.RandomNumberGenerator. For the Guid approach, note that not all digits in a GUID are random. Quote from wikipedia:

    In the canonical representation, xxxxxxxx-xxxx-Mxxx-Nxxx-xxxxxxxxxxxx, the most significant bits of N indicates the variant (depending on the variant; one, two or three bits are used). The variant covered by the UUID specification is indicated by the two most significant bits of N being 1 0 (i.e. the hexadecimal N will always be 8, 9, A, or B). In the variant covered by the UUID specification, there are five versions. For this variant, the four bits of M indicates the UUID version (i.e. the hexadecimal M will either be 1, 2, 3, 4, or 5).

    0 讨论(0)
提交回复
热议问题