Zipping together unicode strings in Python

后端未结

关注

 4  1758

感动是毒 2021-01-24 18:18

I have the string:

a = \"ÀÁÂÃÈÉÊËÌÍÎÏÒÓÔÕÖÙÚÛÜ\" b = \"àáâãäèéçêëìíîïòóôõöùúûüÿ\"

and I want to create the string

\"ÀàÁáÂâ...\"


      
      
        
          4条回答        

        
                    
            
            
                         
                
              
              
                
                   滥情空心
                                             
                
                
                (楼主)
            
              
              
                2021-01-24 19:05
              

            
            
                        
In Python 2.x, strings are not unicode by default. When dealing with unicode data, you have to do the following:


prefix string literals with u character: a = u'ÀÁÂÃÈÉÊËÌÍÎÏÒÓÔÕÖÙÚÛÜ', or
if you want to avoid the u prefix and if the modules you are working with are enough compatible, use from __future__ import unicode_literals import to make string literals interpreted as unicode by default
if you write unicode string literals directly in your Python code, save your .py file in utf-8 format so that the literals are correctly interpreted. Python 2.3+ will interpret the utf-8 BOM ; a good practice is also to add a specific comment line at the beginning of the file to indicate the encoding like # -*- coding: utf-8 -*-, or
you can also keep saving the .py file in ascii, but you will need to escape the unicode characters in the literals, which can be less readable: 'ÀÁÂÃ' should become '\xc0\xc1\xc2\xc3'


Once you fulfill those conditions, the rest is about applying algorithms on those unicode strings the same way you would work with the str version. Here is one possible solution for your problem with the __future__ import:

from __future__ import unicode_literals

from itertools import chain
a = "ÀÁÂÃÈÉÊËÌÍÎÏÒÓÔÕÖÙÚÛÜ"
b = "àáâãäèéçêëìíîïòóôõöùúûüÿ"

print ''.join(chain(*zip(a,b)))

>>> ÀàÁáÂâÃãÈäÉèÊéËçÌêÍëÎìÏíÒîÓïÔòÕóÖôÙõÚöÛùÜú


Further references:


PEP 263 defines the non-ascii encoding comments
PEP 3120 defines utf-8 as the default encoding in Python 3

    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它4个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复