hash unicode string in python

后端未结

关注

 3  2097

I try to hash some unicode strings:

hashlib.sha1(s).hexdigest()
UnicodeEncodeError: \'ascii\' codec can\'t encode characters in position 0-81: 
ordinal not in ra


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  情深已故        
                
              
                            
                2021-02-06 21:20
              
            
            
                                                                       
Apparently hashlib.sha1 isn't expecting a unicode object, but rather a sequence of bytes in a str object. Encoding your unicode string to a sequence of bytes (using, say, the UTF-8 encoding) should fix it:

>>> import hashlib
>>> s = u'é'
>>> hashlib.sha1(s.encode('utf-8'))
<sha1 HASH object @ 029576A0>


The error is because it is trying to convert the unicode object to a str automatically, using the default ascii encoding, which can't handle all those non-ASCII characters (since your string isn't pure ASCII).

A good starting point for learning more about Unicode and encodings is the Python docs, and this article by Joel Spolsky.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  情话喂你        
                
              
                            
                2021-02-06 21:33
              
            
            
                                                                       
Use encoding format utf-8,  Try this easy way,

>>> import hashlib
>>> hashlib.sha256(str(random.getrandbits(256)).encode('utf-8')).hexdigest()
'cd183a211ed2434eac4f31b317c573c50e6c24e3a28b82ddcb0bf8bedf387a9f'

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  夕颜        
                
              
                            
                2021-02-06 21:33
              
            
            
                                                                       
You hash bytes, not strings. So you gotta know what bytes you really want to hash, for example an utf8 memory representation of the string or a utf16 memory representation of the string, etc.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复