Distributed sequence number generation?

后端 未结 13 1761
小鲜肉
小鲜肉 2020-11-29 14:32

I\'ve generally implemented sequence number generation using database sequences in the past.

e.g. Using Postgres SERIAL type http://www.neilconway.o

相关标签:
13条回答
  • 2020-11-29 15:03

    You could have each node have a unique ID (which you may have anyway) and then prepend that to the sequence number.

    For example, node 1 generates sequence 001-00001 001-00002 001-00003 etc. and node 5 generates 005-00001 005-00002

    Unique :-)

    Alternately if you want some sort of a centralized system, you could consider having your sequence server give out in blocks. This reduces the overhead significantly. For example, instead of requesting a new ID from the central server for each ID that must be assigned, you request IDs in blocks of 10,000 from the central server and then only have to do another network request when you run out.

    0 讨论(0)
  • 2020-11-29 15:05

    There are a few strategies; but none that i know can be really distributed and give a real sequence.

    1. have a central number generator. it doesn't have to be a big database. memcached has a fast atomic counter, in the vast majority of cases it's fast enough for your entire cluster.
    2. separate an integer range for each node (like Steven Schlanskter's answer)
    3. use random numbers or UUIDs
    4. use some piece of data, together with the node's ID, and hash it all (or hmac it)

    personally, i'd lean to UUIDs, or memcached if i want to have a mostly-contiguous space.

    0 讨论(0)
  • 2020-11-29 15:07

    The problem is similar to: In iscsi world, where each luns/volumes have to be uniquely identifiable by the initiators running on the client side. The iscsi standard says that the first few bits have to represent the Storage provider/manufacturer information, and the rest monotonically increasing.

    Similarly, one can use the initial bits in the distributed system of nodes to represent the nodeID and the rest can be monotonically increasing.

    0 讨论(0)
  • 2020-11-29 15:09

    Why not use a (thread safe) UUID generator?

    I should probably expand on this.

    UUIDs are guaranteed to be globally unique (if you avoid the ones based on random numbers, where the uniqueness is just highly probable).

    Your "distributed" requirement is met, regardless of how many UUID generators you use, by the global uniqueness of each UUID.

    Your "thread safe" requirement can be met by choosing "thread safe" UUID generators.

    Your "sequence number" requirement is assumed to be met by the guaranteed global uniqueness of each UUID.

    Note that many database sequence number implementations (e.g. Oracle) do not guarantee either monotonically increasing, or (even) increasing sequence numbers (on a per "connection" basis). This is because a consecutive batch of sequence numbers gets allocated in "cached" blocks on a per connection basis. This guarantees global uniqueness and maintains adequate speed. But the sequence numbers actually allocated (over time) can be jumbled when there are being allocated by multiple connections!

    0 讨论(0)
  • 2020-11-29 15:13

    Now there are more options.

    Though this question is "old", I got here, so I think it might be useful to leave the options I know of (so far):

    • You could try Hazelcast. In it's 1.9 release it includes a Distributed implementation of java.util.concurrent.AtomicLong
    • You can also use Zookeeper. It provides methods for creating sequence nodes (appended to znode names, though I prefer using version numbers of the nodes). Be careful with this one though: if you don't want missed numbers in your sequence, it may not be what you want.

    Cheers

    0 讨论(0)
  • 2020-11-29 15:14

    OK, this is a very old question, which I'm first seeing now.

    You'll need to differentiate between sequence numbers and unique IDs that are (optionally) loosely sortable by a specific criteria (typically generation time). True sequence numbers imply knowledge of what all other workers have done, and as such require shared state. There is no easy way of doing this in a distributed, high-scale manner. You could look into things like network broadcasts, windowed ranges for each worker, and distributed hash tables for unique worker IDs, but it's a lot of work.

    Unique IDs are another matter, there are several good ways of generating unique IDs in a decentralized manner:

    a) You could use Twitter's Snowflake ID network service. Snowflake is a:

    • Networked service, i.e. you make a network call to get a unique ID;
    • which produces 64 bit unique IDs that are ordered by generation time;
    • and the service is highly scalable and (potentially) highly available; each instance can generate many thousand IDs per second, and you can run multiple instances on your LAN/WAN;
    • written in Scala, runs on the JVM.

    b) You could generate the unique IDs on the clients themselves, using an approach derived from how UUIDs and Snowflake's IDs are made. There are multiple options, but something along the lines of:

    • The most significant 40 or so bits: A timestamp; the generation time of the ID. (We're using the most significant bits for the timestamp to make IDs sort-able by generation time.)

    • The next 14 or so bits: A per-generator counter, which each generator increments by one for each new ID generated. This ensures that IDs generated at the same moment (same timestamps) do not overlap.

    • The last 10 or so bits: A unique value for each generator. Using this, we don't need to do any synchronization between generators (which is extremely hard), as all generators produce non-overlapping IDs because of this value.

    c) You could generate the IDs on the clients, using just a timestamp and random value. This avoids the need to know all generators, and assign each generator a unique value. On the flip side, such IDs are not guaranteed to be globally unique, they're only very highly likely to be unique. (To collide, one or more generators would have to create the same random value at the exact same time.) Something along the lines of:

    • The most significant 32 bits: Timestamp, the generation time of the ID.
    • The least significant 32 bits: 32-bits of randomness, generated anew for each ID.

    d) The easy way out, use UUIDs / GUIDs.

    0 讨论(0)
提交回复
热议问题