Convert UTF8 to UTF16 using iconv

后端 未结 3 1736
孤独总比滥情好
孤独总比滥情好 2021-02-03 22:14

When I use iconv to convert from UTF16 to UTF8 then all is fine but vice versa it does not work. I have these files:

a-16.strings:    Little-endian UTF-16 Unicod         


        
3条回答
  •  一向
    一向 (楼主)
    2021-02-03 22:46

    UTF-16LE tells iconv to generate little-endian UTF-16 without a BOM (Byte Order Mark). Apparently it assumes that since you specified LE, the BOM isn't necessary.

    UTF-16 tells it to generate UTF-16 text (in the local machine's byte order) with a BOM.

    If you're on a little-endian machine, I don't see a way to tell iconv to generate big-endian UTF-16 with a BOM, but I might just be missing something.

    I find that the file command doesn't recognize UTF-16 text without a BOM, and your editor might not either. But if you run iconv -f UTF-16LE -t UTF_8 b-16 strings, you should get a valid UTF-8 version of the original file.

    Try running od -c on the files to see their actual contents.

    UPDATE :

    It looks like you're on a big-endian machine (x86 is little-endian), and you're trying to generate a little-endian UTF-16 file with a BOM. Is that correct? As far as I can tell, iconv won't do that directly. But this should work:

    ( printf "\xff\xfe" ; iconv -f utf-8 -t utf-16le UTF-8-FILE ) > UTF-16-FILE
    

    The behavior of the printf might depend on your locale settings; I have LANG=en_US.UTF-8.

    (Can anyone suggest a more elegant solution?)

    Another workaround, if you know the endianness of the output produced by -t utf-16:

    iconv -f utf-8 -t utf-16 UTF-8-FILE | dd conv=swab 2>/dev/null
    

提交回复
热议问题