Is python uuid1 sequential as timestamps?

后端未结

关注

 5  1226

Python docs states that uuid1 uses current time to form the uuid value. But I could not find a reference that ensures UUID1 is sequential.

>>> impor


                      
              相关标签:


      
      
        
          5条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  轻奢々        
                
              
                            
                2020-12-19 00:55
              
            
            
                                                                       
Argumentless use of uuid.uuid1() gives non-sequential results (see answer by @basil-bourque), but it can be easily made sequential if you set clock_seq or node arguments (because in this case uuid1 uses python implementation that guarantees to have unique and sequential timestamp part of the UUID in current process):

import time

from uuid import uuid1, getnode
from random import getrandbits

_my_clock_seq = getrandbits(14)
_my_node = getnode()


def sequential_uuid(node=None):
    return uuid1(node=node, clock_seq=_my_clock_seq)


def alt_sequential_uuid(clock_seq=None):
    return uuid1(node=_my_node, clock_seq=clock_seq)



if __name__ == '__main__':
    from itertools import count
    old_n = uuid1()  # "Native"
    old_s = sequential_uuid()  # Sequential

    native_conflict_index = None

    t_0 = time.time()

    for x in count():
        new_n = uuid1()
        new_s = sequential_uuid()

        if old_n > new_n and not native_conflict_index:
            native_conflict_index = x

        if old_s >= new_s:
            print("OOops: non-sequential results for `sequential_uuid()`")
            break

        if (x >= 10*0x3fff and time.time() - t_0 > 30) or (native_conflict_index and x > 2*native_conflict_index):
            print('No issues for `sequential_uuid()`')
            break

        old_n = new_n
        old_s = new_s

    print(f'Conflicts for `uuid.uuid1()`: {bool(native_conflict_index)}')
    print(f"Tries: {x}")



Multiple processes issues

BUT if you are running some parallel processes on the same machine, then:


node which defaults to uuid.get_node() will be the same for all the processes;
clock_seq has small chance to be the same for some processes (chance of 1/16384)


That might lead to conflicts! That is general concern for using
   uuid.uuid1 in parallel processes on the same machine unless you have access to SafeUUID from Python3.7.

If you make sure to also set node to unique value for each parallel process that runs this code, then conflicts should not happen.

Even if you are using SafeUUID, and set unique node, it's still possible to have non-sequential ids if they are generated in different processes.

If some lock-related overhead is acceptable, then you can store clock_seq in some external atomic storage (for example in "locked" file) and increment it with each call: this allows to have same value for node on all parallel processes and also will make id-s sequential. For cases when all parallel processes are subprocesses created using multiprocessing: clock_seq can be "shared" using multiprocessing.Value
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  故里飘歌        
                
              
                            
                2020-12-19 01:04
              
            
            
                                                                       
I stumbled upon a probable answer in Cassandra/Python from http://doanduyhai.wordpress.com/2012/07/05/apache-cassandra-tricks-and-traps/

Lexicographic TimeUUID ordering

Cassandra provides, among all the primitive types, support for UUID values of type 1 (time and server based) and type 4 (random).

The primary use of UUID (Unique Universal IDentifier) is to obtain a really unique identifier in a potentially distributed environment.

Cassandra does support version 1 UUID. It gives you an unique identifier by combining the computer’s MAC address and the number of 100-nanosecond intervals since the beginning of the Gregorian calendar.

As you can see the precision is only 100 nanoseconds, but fortunately it is mixed with a clock sequence to add randomness. Furthermore the MAC address is also used to compute the UUID so it’s very unlikely that you face collision on one cluster of machine, unless you need to process a really really huge volume of data (don’t forget, not everyone is Twitter or Facebook).

One of the most relevant use case for UUID, and espcecially TimeUUID, is to use it as column key. Since Cassandra column keys are sorted, we can take advantage of this feature to have a natural ordering for our column families.

The problem with the default com.eaio.uuid.UUID provided by the Hector client is that it’s not easy to work with. As an ID you may need to bring this value from the server up to the view layer, and that’s the gotcha.

Basically, com.eaio.uuid.UUID overrides the toString() to gives a String representation of the UUID. However this String formatting cannot be sorted lexicographically…

Below are some TimeUUID generated consecutively:

8e4cab00-c481-11e1-983b-20cf309ff6dc at some t1
2b6e3160-c482-11e1-addf-20cf309ff6dc at some t2 with t2 > t1


“2b6e3160-c482-11e1-addf-20cf309ff6dc”.compareTo(“8e4cab00-c481-11e1-983b-20cf309ff6dc”) gives -6 meaning that “2b6e3160-c482-11e1-addf-20cf309ff6dc” is less/before “8e4cab00-c481-11e1-983b-20cf309ff6dc” which is incorrect.

The current textual display of TimeUUID is split as follow:

time_low – time_mid – time_high_and_version – variant_and_sequence – node


If we re-order it starting with time_high_and_version, we can then sort it lexicographically:

time_high_and_version – time_mid – time_low – variant_and_sequence – node


The utility class is given below:

public static String reorderTimeUUId(String originalTimeUUID)
    {
        StringTokenizer tokens = new StringTokenizer(originalTimeUUID, "-");
        if (tokens.countTokens() == 5)
        {
            String time_low = tokens.nextToken();
            String time_mid = tokens.nextToken();
            String time_high_and_version = tokens.nextToken();
            String variant_and_sequence = tokens.nextToken();
            String node = tokens.nextToken();

            return time_high_and_version + '-' + time_mid + '-' + time_low + '-' + variant_and_sequence + '-' + node;

        }

        return originalTimeUUID;
    }


The TimeUUIDs become:

11e1-c481-8e4cab00-983b-20cf309ff6dc
11e1-c482-2b6e3160-addf-20cf309ff6dc


Now we get:

"11e1-c481-8e4cab00-983b-20cf309ff6dc".compareTo("11e1-c482-2b6e3160-addf-20cf309ff6dc") = -1

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  滥情空心        
                
              
                            
                2020-12-19 01:09
              
            
            
                                                                       
UUIDs Not Sequential

No, standard UUIDs are not meant to be sequential. 

Apparently some attempts were made with GUIDs (Microsoft's twist on UUIDs) to make them sequential to help with performance in certain database scenarios. But being sequential is not the intent of UUIDs.
http://en.wikipedia.org/wiki/Globally_unique_identifier

MAC Is Last, Not First

No, in standard UUIDs, the MAC address is not the first component. The MAC address is the last component in a Version 1 UUID.
http://en.wikipedia.org/wiki/Universally_unique_identifier

Do Not Assume Which Type Of UUID

The various versions of UUIDs are meant to be compatible with each other. So it may be unreasonable to expect that you always have Version 1 UUIDs. Other programmers may use other versions.

Specification

Read the UUID spec, RFC 4122, by the IETF. Only a dozen pages long. 
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  轮回少年        
                
              
                            
                2020-12-19 01:11
              
            
            
                                                                       
But not always:

>>> def test(n):
...     old = uuid.uuid1()
...     print old
...     for x in range(n):
...             new = uuid.uuid1()
...             if old >= new:
...                     print "OOops"
...                     break
...             old = new
...     print new
>>> test(1000000)
fd4ae687-3619-11e1-8801-c82a1450e52f
OOops
00000035-361a-11e1-bc9f-c82a1450e52f

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  无人及你        
                
              
                            
                2020-12-19 01:16
              
            
            
                                                                       
From the python UUID docs:


  Generate a UUID from a host ID, sequence number, and the current time. If node is not given, getnode() is used to obtain the hardware address. If clock_seq is given, it is used as the sequence number; otherwise a random 14-bit sequence number is chosen.


From this, I infer that the MAC address is first, then a (possibly random) sequence number, then the current time. So I would not expect these to be guaranteed to be monotonically increasing, even for UUIDs generated by the same machine/process.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复