UUID performance in MySQL?

前端 未结 9 1430
猫巷女王i
猫巷女王i 2020-11-27 10:14

We\'re considering using UUID values as primary keys for our MySQL database. The data being inserted is generated from dozens, hundreds, or even thousands of remote computer

相关标签:
9条回答
  • 2020-11-27 10:52

    A UUID is a Universally Unique ID. It's the universally part that you should be considering here.

    Do you really need the IDs to be universally unique? If so, then UUIDs may be your only choice.

    I would strongly suggest that if you do use UUIDs, you store them as a number and not as a string. If you have 50M+ records, then the saving in storage space will improve your performance (although I couldn't say by how much).

    If your IDs do not need to be universally unique, then I don't think that you can do much better then just using auto_increment, which guarantees that IDs will be unique within a table (since the value will increment each time)

    0 讨论(0)
  • 2020-11-27 10:59

    The short answer is that many databases have performance problems (in particular with high INSERT volumes) due to a conflict between their indexing method and UUIDs' deliberate entropy in the high-order bits. There are several common hacks:

    • choose a different index type (e.g. nonclustered on MSSQL) that doesn't mind it
    • munge the data to move the entropy to lower-order bits (e.g. reordering bytes of V1 UUIDs on MySQL)
    • make the UUID a secondary key with an auto-increment int primary key

    ... but these are all hacks--and probably fragile ones at that.

    The best answer, but unfortunately the slowest one, is to demand your vendor improve their product so it can deal with UUIDs as primary keys just like any other type. They shouldn't be forcing you to roll your own half-baked hack to make up for their failure to solve what has become a common use case and will only continue to grow.

    0 讨论(0)
  • 2020-11-27 11:01

    At my job, we use UUID as PKs. What I can tell you from experience is DO NOT USE THEM as PKs (SQL Server by the way).

    It's one of those things that when you have less than 1000 records it;s ok, but when you have millions, it's the worst thing you can do. Why? Because UUID are not sequential, so everytime a new record is inserted MSSQL needs to go look at the correct page to insert the record in, and then insert the record. The really ugly consequence with this is that the pages end up all in different sizes and they end up fragmented, so now we have to do de-fragmentation periodic.

    When you use an autoincrement, MSSQL will always go to the last page, and you end up with equally sized pages (in theory) so the performance to select those records is much better (also because the INSERTs will not block the table/page for so long).

    However, the big advantage of using UUID as PKs is that if we have clusters of DBs, there will not be conflicts when merging.

    I would recommend the following model: 1. PK INT Identity 2. Additional column automatically generated as UUID.

    This way, the merge process is possible (UUID would be your REAL key, while the PK would just be something temporary that gives you good performance).

    NOTE: That the best solution is to use NEWSEQUENTIALID (like I was saying in the comments), but for legacy app with not much time to refactor (and even worse, not controlling all inserts), it is not possible to do. But indeed as of 2017, I'd say the best solution here is NEWSEQUENTIALID or doing Guid.Comb with NHibernate.

    Hope this helps

    0 讨论(0)
提交回复
热议问题