MySQL PRIMARY KEYs: UUID / GUID vs BIGINT (timestamp+random)

后端 未结 4 1853
谎友^
谎友^ 2020-12-24 07:47

tl;dr: Is assigning rows IDs of {unixtimestamp}{randomdigits} (such as 1308022796123456) as a BIGINT a good idea if I don\'t want to deal with UUIDs?

相关标签:
4条回答
  • 2020-12-24 08:26

    I have run into this very problem in my professional life. We used timestamp + random number and ran into serious issues when our applications scaled up (more clients, more servers, more requests). Granted, we (stupidly) used only 4 digits, and then change to 6, but you would be surprised how often that the errors still happen.

    Over a long enough period of time, you are guaranteed to get duplicate key errors. Our application is mission critical, and therefore even the smallest chance it could fail to due inherently random behavior was unacceptable. We started using UUIDs to avoid this issue, and carefully managed their creation.

    Using UUIDs, your index size will increase, and a larger index will result in poorer performance (perhaps unnoticeable, but poorer none-the-less). However MySQL supports a native UUID type (never use varchar as a primary key!!), and can handle indexing, searching,etc pretty damn efficiently even compared to bigint. The biggest performance hit to your index is almost always the number of rows indexed, rather than the size of the item being index (unless you want to index on a longtext or something ridiculous like that).

    To answer you question: Bigint (with random numbers attached) will be ok if you do not plan on scaling your application/service significantly. If your code can handle the change without much alteration and your application will not explode if a duplicate key error occurs, go with it. Otherwise, bite-the-bullet and go for the more substantial option.

    You can always implement a larger change later, like switching to an entirely different backend (which we are now facing... :P)

    0 讨论(0)
  • 2020-12-24 08:28

    If you want to use the timestamp method then do this:

    Give each server a number, to that append the proccess ID of the application that is doing the insert (or the thread ID) (in PHP it's getmypid()), then to that append how long that process has been alive/active for (in PHP it's getrusage()), and finally add a counter that starts at 0 at the start of each script invocation (i.e. each insert within the same script adds one to it).

    Also, you don't need to store the full unix timestamp - most of those digits are for saying it's year 2011, and not year 1970. So if you can't get a number saying how long the process was alive for, then at least subtract a fixed timestamp representing today - that way you'll need far less digits.

    0 讨论(0)
  • 2020-12-24 08:33

    You can manually change the autonumber starting number.

    ALTER TABLE foo AUTO_INCREMENT = ####
    

    An unsigned int can store up to 4,294,967,295, lets round it down to 4,290,000,000.

    Use the first 3 digits for the server serial number, and the final 7 digits for the row id.

    This gives you up to 430 servers (including 000), and up to 10 million IDs for each server.

    So for server #172 you manually change the autonumber to start at 1,720,000,000, then let it assign IDs sequentially.

    If you think you might have more servers, but less IDs per server, then adjust it to 4 digits per server and 6 for the ID (i.e. up to 1 million IDs).

    You can also split the number using binary digits instead of decimal digits (perhaps 10 binary digits per server, and 22 for the ID. So, for example, server 76 starts at 2^22*76 = 318,767,104 and ends at 322,961,407).

    For that matter you don't even need a clear split. Take 4,294,967,295 divide it by the maximum number of servers you think you will ever have, and that's your spacing.

    You could use a bigint if you think you need more identifiers, but that's a seriously huge number.

    0 讨论(0)
  • 2020-12-24 08:45

    Use the GUID as a unique index, but also calculate a 64-bit (BIGINT) hash of the GUID, store that in a separate NOT UNIQUE column, and index it. To retrieve, query for a match to both columns - the 64-bit index should make this efficient.

    What's good about this is that the hash:
    a. Doesn't have to be unique.
    b. Is likely to be well-distributed.

    The cost: extra 8-byte column and its index.

    0 讨论(0)
提交回复
热议问题