Reliance on default encoding, what should I use and why?

前端 未结 4 1761
爱一瞬间的悲伤
爱一瞬间的悲伤 2020-12-28 14:36

FindBugs reports a bug:

Reliance on default encoding Found a call to a method which will perform a byte to String (or String to byte) conversion, a

相关标签:
4条回答
  • 2020-12-28 14:58

    Ideally, it should be:

    try (InputStream in = new FileInputStream(file);
         Reader reader = new InputStreamReader(in, StandardCharsets.UTF_8);
         BufferedReader br = new BufferedReader(reader)) {
    

    ...or:

    try (BufferedReader br = Files.newBufferedReader(path, StandardCharsets.UTF_8)) {
    

    ...assuming the file is encoded as UTF-8.

    Pretty much every encoding that isn't a Unicode Transformation Format is obsolete for natural language data. There are languages you cannot support without Unicode.

    0 讨论(0)
  • 2020-12-28 14:59

    You should use default encoding whenever you read a file that is outside your application and can be assumed to be in the user's local encoding, for example user written text files. You might want to use the default encoding when writing such files, depending on what the user is going to do with that file later.

    You should not use default encoding for any other file, especially application relevant files.

    If you application for example writes configuration files in text format, you should always specify the encoding. In general UTF-8 is always a good choice, as it is compatible to almost everything. Not doing so might cause surprise crashes by users in other countries.

    This is not only limited to character encoding, but as well to date/time, numeric or other language specific formats. If you for example use default encoding and default date/time strings on a US machine, then try to read that file on a German server, you might be surprised why one half is gibberish and the other half has month/days confused or is off by one hour because of daylight saving time.

    0 讨论(0)
  • 2020-12-28 15:01

    When you are using a PrintWriter,

    File file = new File(file_path);
    Writer w = new OutputStreamWriter(new FileOutputStream(file), StandardCharsets.UTF_16.name());
    PrintWriter pw = new PrintWriter(w);
    pw.println(content_to_write);
    pw.close();
    
    0 讨论(0)
  • 2020-12-28 15:07

    If the file is under the control of your application, and if you want the file to be encoded in the platform's default encoding, then you can use the default platform encoding. Specifying it explicitely makes it clearer, for you and future maintainers, that this is your intention. This would be a reasonable default for a text editor, for example, which would then write files that any other editor on this platform would then be able to read.

    If, on the other hand, you want to make sure that any possible character can be written in your file, you should use a universal encoding like UTF8.

    And if the file comes from an external application, or is supposed to be compatible with an external application, then you should use the encoding that this external application expects.

    What you must realize is that if you write a file like you're doing on a machine, and read it as you're doing on another machine, which doesn't have the same default encoding, you won't necessarily be able to read what you have written. Using a specific encoding, to write and read, like UTF8 makes sure the file will always be the same, whatever platform is used when writing the file.

    0 讨论(0)
提交回复
热议问题