When is hash(n) == n in Python?

前端未结

关注

 4  465

I\'ve been playing with Python\'s hash function. For small integers, it appears hash(n) == n always. However this does not extend to large numbers:


                      
              相关标签:


      
      
        
          4条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  自闭症患者        
                
              
                            
                2021-01-30 05:18
              
            
            
                                                                       
Hash function returns plain int that means that returned value is greater than -sys.maxint and lower than sys.maxint, which means if you pass sys.maxint + x to it result would be -sys.maxint + (x - 2).

hash(sys.maxint + 1) == sys.maxint + 1 # False
hash(sys.maxint + 1) == - sys.maxint -1 # True
hash(sys.maxint + sys.maxint) == -sys.maxint + sys.maxint - 2 # True


Meanwhile 2**200 is a n times greater than sys.maxint -  my guess is that hash would go over range -sys.maxint..+sys.maxint n times until it stops on plain integer in that range, like in code snippets above..

So generally, for any n <= sys.maxint:

hash(sys.maxint*n) == -sys.maxint*(n%2) +  2*(n%2)*sys.maxint - n/2 - (n + 1)%2 ## True


Note: this is true for python 2.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  轮回少年        
                
              
                            
                2021-01-30 05:20
              
            
            
                                                                       
2305843009213693951 is 2^61 - 1. It's the largest Mersenne prime that fits into 64 bits.

If you have to make a hash just by taking the value mod some number, then a large Mersenne prime is a good choice -- it's easy to compute and ensures an even distribution of possibilities. (Although I personally would never make a hash this way)

It's especially convenient to compute the modulus for floating point numbers.  They have an exponential component that multiplies the whole number by 2^x.  Since 2^61 = 1 mod 2^61-1, you only need to consider the (exponent) mod 61.

See: https://en.wikipedia.org/wiki/Mersenne_prime
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  刺人心        
                
              
                            
                2021-01-30 05:39
              
            
            
                                                                       
Based on python documentation in pyhash.c file:


  For numeric types, the hash of a number x is based on the reduction
     of x modulo the prime P = 2**_PyHASH_BITS - 1.  It's designed so that
     hash(x) == hash(y) whenever x and y are numerically equal, even if
     x and y have different types.


So for a 64/32 bit machine, the reduction would be 2 ^_PyHASH_BITS - 1, but what is _PyHASH_BITS?

You can find it in pyhash.h header file which for a 64 bit machine has been defined as 61 (you can read more explanation in pyconfig.h file).

#if SIZEOF_VOID_P >= 8
#  define _PyHASH_BITS 61
#else
#  define _PyHASH_BITS 31
#endif


So first off all it's based on your platform for example in my 64bit Linux platform the reduction is 2⁶¹-1, which is 2305843009213693951:

>>> 2**61 - 1
2305843009213693951


Also You can use math.frexp in order to get the mantissa and exponent of sys.maxint which for a 64 bit machine shows that max int is 2⁶³: 

>>> import math
>>> math.frexp(sys.maxint)
(0.5, 64)


And you can see the difference by a simple test:

>>> hash(2**62) == 2**62
True
>>> hash(2**63) == 2**63
False


Read the complete documentation about python hashing algorithm https://github.com/python/cpython/blob/master/Python/pyhash.c#L34

As mentioned in comment you can use sys.hash_info (in python 3.X) which will give you a struct sequence of parameters used for computing
hashes.

>>> sys.hash_info
sys.hash_info(width=64, modulus=2305843009213693951, inf=314159, nan=0, imag=1000003, algorithm='siphash24', hash_bits=64, seed_bits=128, cutoff=0)
>>> 


Alongside the modulus that I've described in preceding lines, you can also get the inf value as following:

>>> hash(float('inf'))
314159
>>> sys.hash_info.inf
314159

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  心在旅途        
                
              
                            
                2021-01-30 05:40
              
            
            
                                                                       
The implementation for the int type in cpython can be found here. 

It just returns the value, except for -1, than it returns -2:

static long
int_hash(PyIntObject *v)
{
    /* XXX If this is changed, you also need to change the way
       Python's long, float and complex types are hashed. */
    long x = v -> ob_ival;
    if (x == -1)
        x = -2;
    return x;
}

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复