Best way to convert text files between character sets?

后端 未结 21 2015
再見小時候
再見小時候 2020-11-22 04:42

What is the fastest, easiest tool or method to convert text files between character sets?

Specifically, I need to convert from UTF-8 to ISO-8859-15 and vice versa.

相关标签:
21条回答
  • 2020-11-22 05:06

    to write properties file (Java) normally I use this in linux (mint and ubuntu distributions):

    $ native2ascii filename.properties
    

    For example:

    $ cat test.properties 
    first=Execução número um
    second=Execução número dois
    
    $ native2ascii test.properties 
    first=Execu\u00e7\u00e3o n\u00famero um
    second=Execu\u00e7\u00e3o n\u00famero dois
    

    PS: I writed Execution number one/two in portugues to force special characters.

    In my case, in first execution I received this message:

    $ native2ascii teste.txt 
    The program 'native2ascii' can be found in the following packages:
     * gcj-5-jdk
     * openjdk-8-jdk-headless
     * gcj-4.8-jdk
     * gcj-4.9-jdk
    Try: sudo apt install <selected package>
    

    When I installed the first option (gcj-5-jdk) the problem was finished.

    I hope this help someone.

    0 讨论(0)
  • 2020-11-22 05:06

    Try EncodingChecker

    EncodingChecker on github

    File Encoding Checker is a GUI tool that allows you to validate the text encoding of one or more files. The tool can display the encoding for all selected files, or only the files that do not have the encodings you specify.

    File Encoding Checker requires .NET 4 or above to run.

    For encoding detection, File Encoding Checker uses the UtfUnknown Charset Detector library. UTF-16 text files without byte-order-mark (BOM) can be detected by heuristics.

    0 讨论(0)
  • 2020-11-22 05:09

    As described on How do I correct the character encoding of a file? Synalyze It! lets you easily convert on OS X between all encodings supported by the ICU library.

    Additionally you can display some bytes of a file translated to Unicode from all the encodings to see quickly which is the right one for your file.

    0 讨论(0)
  • 2020-11-22 05:12

    Try iconv Bash function

    I've put this into .bashrc:

    utf8()
    {
        iconv -f ISO-8859-1 -t UTF-8 $1 > $1.tmp
        rm $1
        mv $1.tmp $1
    }
    

    ..to be able to convert files like so:

    utf8 MyClass.java
    
    0 讨论(0)
  • 2020-11-22 05:12

    Simply change encoding of loaded file in IntelliJ IDEA IDE, on the right of status bar (bottom), where current charset is indicated. It prompts to Reload or Convert, use Convert. Make sure you backed up original file in advance.

    0 讨论(0)
  • 2020-11-22 05:13

    With ruby:

    ruby -e "File.write('output.txt', File.read('input.txt').encode('UTF-8', 'binary', invalid: :replace, undef: :replace, replace: ''))"
    

    Source: https://robots.thoughtbot.com/fight-back-utf-8-invalid-byte-sequences

    0 讨论(0)
提交回复
热议问题