How can I convert strings like “\u5c0f\u738b\u5b50\u003a\u6c49\u6cd5\u82f1\u5bf9\u7167” to Chinese characters

后端未结

关注

 1  529

I am now working on a small tool to request and decode a webpage, on which the Chinese characters are stored as string like

\\u5c0f\\u738b\\u5b50\\u003a\\u6c49\


                      
              相关标签:


      
      
        
          1条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  天命终不由人        
                
              
                            
                2021-01-27 01:44
              
            
            
                                                                       
Those are Unicode codepoints already. They represent Chinese characters, but using escape codes that are easier on the developer:

>>> print u'\u5c0f\u738b\u5b50\u003a\u6c49\u6cd5\u82f1\u5bf9\u7167'
小王子:汉法英对照


You do not have to do anything to convert those; the \uxxxx escape form is simply another way to express the same codepoint. See String Literals:


  \uxxxx

   Character with 16-bit hex value xxxx (Unicode only)

  \Uxxxxxxxx

  Character with 32-bit hex value xxxxxxxx (Unicode only)


Python interprets those escape codes when reading the source code to construct the unicode value.

If the source of the data is not from Python source code but from the web, you have JSON data instead, which uses the same escape format:

>>> import json
>>> print json.loads('"\u5c0f\u738b\u5b50\u003a\u6c49\u6cd5\u82f1\u5bf9\u7167"')
小王子:汉法英对照


Note that the value then needs to be part of a larger string, one that at least includes quotes to mark this a string.

Also note that the JSON string escape format differs from Python's when it comes to non-BMP (supplementary) codepoints; JSON treats those like UTF-16 does, by creating a surrogate pair and use two \uxxxx sequences for such a codepoint. In Python you'd use a \Uhhhhhhhh 32-bit hex value.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复