Convert UTF8 to UTF16 using iconv

后端未结

关注

 3  1743

孤独总比滥情好 2021-02-03 22:14

When I use iconv to convert from UTF16 to UTF8 then all is fine but vice versa it does not work. I have these files:

a-16.strings:    Little-endian UTF-16 Unicod


      
      
        
          3条回答        

        
                    
            
            
                         
                
              
              
                
                   遇见更好的自我
                                             
                
                
                (楼主)
            
              
              
                2021-02-03 22:48
              

            
            
                        
This may not be an elegant solution but I found a manual way to ensure correct conversion for my problem which I believe is similar to the subject of this thread.

The Problem: 
I got a text datafile from a user and I was going to process it on Linux (specifically, Ubuntu) using shell script (tokenization, splitting, etc.). Let's call the file myfile.txt. The first indication that I got that something was amiss was that the tokenization was not working. So I was not surprised when I ran the file command on myfile.txt and got the following

$ file myfile.txt

myfile.txt: Little-endian UTF-16 Unicode text, with very long lines, with CRLF line terminators


If the file was compliant, here is what should have been the conversation:

$ file myfile.txt

myfile.txt: ASCII text, with very long lines


The Solution:
To make the datafile compliant, below are the 3 manual steps that I found to work after some trial and error with other steps.


First convert to Big Endian at the same encoding via vi (or vim). vi myfile.txt. In vi do :set fileencoding=UTF-16BE then write out the file. You may have to force it with :!wq.
vi myfile.txt (which should now be in utf-16BE). In vi do :set fileencoding=ASCII then write out the file. Again, you may have to force the write with !wq.
Run dos2unix converter: d2u myfile.txt. If you now run file myfile.txt you should now see an output or something more familiar and assuring like:

myfile.txt: ASCII text, with very long lines



That's it. That's what worked for me, and I was then able to run my processing bash shell script of myfile.txt. I found that I cannot skip Step 2. That is, in this case I cannot skip directly to Step 3. Hopefully you can find this info useful; hopefully someone can automate it perhaps via sed or the like. Cheers.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它3个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复