Convert UTF8 to UTF16 using iconv

后端 未结 3 1743
孤独总比滥情好
孤独总比滥情好 2021-02-03 22:14

When I use iconv to convert from UTF16 to UTF8 then all is fine but vice versa it does not work. I have these files:

a-16.strings:    Little-endian UTF-16 Unicod         


        
3条回答
  •  遇见更好的自我
    2021-02-03 22:48

    This may not be an elegant solution but I found a manual way to ensure correct conversion for my problem which I believe is similar to the subject of this thread.

    The Problem: I got a text datafile from a user and I was going to process it on Linux (specifically, Ubuntu) using shell script (tokenization, splitting, etc.). Let's call the file myfile.txt. The first indication that I got that something was amiss was that the tokenization was not working. So I was not surprised when I ran the file command on myfile.txt and got the following

    $ file myfile.txt
    
    myfile.txt: Little-endian UTF-16 Unicode text, with very long lines, with CRLF line terminators
    

    If the file was compliant, here is what should have been the conversation:

    $ file myfile.txt
    
    myfile.txt: ASCII text, with very long lines
    

    The Solution: To make the datafile compliant, below are the 3 manual steps that I found to work after some trial and error with other steps.

    1. First convert to Big Endian at the same encoding via vi (or vim). vi myfile.txt. In vi do :set fileencoding=UTF-16BE then write out the file. You may have to force it with :!wq.

    2. vi myfile.txt (which should now be in utf-16BE). In vi do :set fileencoding=ASCII then write out the file. Again, you may have to force the write with !wq.

    3. Run dos2unix converter: d2u myfile.txt. If you now run file myfile.txt you should now see an output or something more familiar and assuring like:

      myfile.txt: ASCII text, with very long lines
      

    That's it. That's what worked for me, and I was then able to run my processing bash shell script of myfile.txt. I found that I cannot skip Step 2. That is, in this case I cannot skip directly to Step 3. Hopefully you can find this info useful; hopefully someone can automate it perhaps via sed or the like. Cheers.

提交回复
热议问题