Proposed solution: Generate unique IDs in a distributed environment

后端 未结 2 1651
深忆病人
深忆病人 2021-02-01 08:47

I\'ve been browsing the net trying to find a solution that will allow us to generate unique IDs in a regionally distributed environment.

I looked at the following option

相关标签:
2条回答
  • 2021-02-01 09:02

    I think this looks pretty solid. Each region maintains consistency, and if you use XDCR there are no collisions. INCR is atomic within a cluster, so you will have no issues there. You don't actually need to have the Machine code part of it. If all the app servers within a region are connected to the same cluster, it's irrelevant to infix the 00001 part of it. If that is useful for you for other reasons (some sort of analytics) then by all means, but it isn't necessary.

    So it can simply be '4' . 1' (using your example)

    Can you give me an example of what kind of "sorting" you need?

    First: One downside of adding entropy (and I am not sure why you would need it), is you cannot iterate over the ID collection as easily.

    For Example: If you ID's from 1-100, which you will know from a simple GET query on the Counter key, you could assign tasks by group, this task takes 1-10, the next 11-20 and so on, and workers can execute in parallel. If you add entropy, you will need to use a Map/Reduce View to pull the collections down, so you are losing the benefit of a key-value pattern.

    Second: Since you are concerned with readability, it can be valuable to add a document/object type identifier as well, and this can be used in Map/Reduce Views (or you can use a json key to identify that).

    Ex: 'u:' . '4' . '1'

    If you are referring to ID's externally, you might want to obscure in other ways. If you need an example, let me know and I can append my answer with something you could do.

    @scalabl3

    0 讨论(0)
  • 2021-02-01 09:28

    You are concerned about IDs for two reasons:

    1. Potential for collisions in a complex network infrastructure
    2. Appearance

    Starting with the second issue, Appearance. While a UUID certainly isn't a great beauty when it comes to an identifier, there are diminishing returns as you introduce a truly unique number across a complex data center (or data centers) as you mention. I'm not convinced that there is a dramatic change in perception of an application when a long number versus a UUID is used for example in a URL to a web application. Ideally, neither would be shown, and the ID would only ever be sent via Ajax requests, etc. While a nice clean memorable URL is preferable, it's never stopped me from shopping at Amazon (where they have absolutely hideous URLs). :)

    Even with your proposal, the identifiers, while they would be shorter in the number of characters than a UUID, they are no more memorable than a UUID. So, the appearance likely would remain debatable.

    Talking about the first point..., yes, there are a few cases where UUIDs have been known to generate conflicts. While that shouldn't happen in a properly configured and consistently obtained architecture, I can see how it might happen (but I'm personally a lot less concerned about it).

    So, if you're talking about alternatives, I've become a fan of the simplicity of the MongoDB ObjectId and its techniques for avoiding duplication when generating an ID. The full documentation is here. The quick relevant pieces are similar to your potential design in several ways:

    ObjectId is a 12-byte BSON type, constructed using:

    • a 4-byte value representing the seconds since the Unix epoch,
    • a 3-byte machine identifier,
    • a 2-byte process id, and
    • a 3-byte counter, starting with a random value.

    The timestamp can often be useful for sorting. The machine identifier is similar to your application server having a unique ID. The process id is just additional entropy, and finally to prevent conflicts, there is a counter that is auto incremented whenever the timestamp is the same as the last time an ObjectId is generated (so that ObjectIds can be created rapidly). ObjectIds can be generated on the client or on the database. Further, ObjectIds do take up fewer bytes than a UUID (but only 4). Of course, you could not use the timestamp and drop 4 bytes.

    For clarification, I'm not suggesting you use MongoDB, but be inspired by the technique they use for ID generation.

    So, I think your solution is decent (and maybe you want to be inspired by MongoDB's implementation of a unique ID) and doable. As to whether you need to do it, I think that's a question only you can answer.

    0 讨论(0)
提交回复
热议问题