What is the difference between u' ' prefix and unicode() in python?

后端未结

关注

 4  1753

慢半拍i 2020-12-16 02:35

What is the difference between u\'\' prefix and unicode()?

# -*- coding: utf-8 -*-
print u\'上午\'  # this works
print unicode(\'上午\'


      
      
        
          4条回答        

        
                    
            
            
                         
                
              
              
                
                   醉梦人生
                                             
                
                
                (楼主)
            
              
              
                2020-12-16 02:53
              

            
            
                        

u'..' is a string literal, and decodes the characters according to the source encoding declaration.
unicode() is a function that converts another type to a unicode object, you've given it a byte string literal. It'll decode a byte string according to the default ASCII codec.


So you created a byte string object using a different type of literal notation, then tried to convert it to a unicode() object, which fails because the default codec for str -> unicode conversions is ASCII.

The two are quite different beasts. If you want to use the latter, you need to give it an explicit codec:

print unicode('上午', 'utf8')


The two are related in the same way that using 0xFF and int('0xFF', 0) are related; the former defines an integer of value 255 using hex notation, the latter uses the int() function to extract an integer from a string.

An alternative method would be to use the str.decode() method:

print '上午'.decode('utf8')


Don't be tempted to use an error handler (such as ignore' or 'replace') unless you know what you are doing. 'ignore' especially can mask underlying issues with having picked the wrong codec, for example.

You may want to read up on Python and Unicode:


Pragmatic Unicode by Ned Batchelder
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky
The Python Unicode HOWTO

    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它4个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复