All inclusive Charset to avoid “java.nio.charset.MalformedInputException: Input length = 1”?

前端 未结 10 1281
醉话见心
醉话见心 2020-11-30 00:48

I\'m creating a simple wordcount program in Java that reads through a directory\'s text-based files.

However, I keep on getting the error:

java.nio.c         


        
相关标签:
10条回答
  • 2020-11-30 01:37

    Well, the problem is that Files.newBufferedReader(Path path) is implemented like this :

    public static BufferedReader newBufferedReader(Path path) throws IOException {
        return newBufferedReader(path, StandardCharsets.UTF_8);
    }
    

    so basically there is no point in specifying UTF-8 unless you want to be descriptive in your code. If you want to try a "broader" charset you could try with StandardCharsets.UTF_16, but you can't be 100% sure to get every possible character anyway.

    0 讨论(0)
  • 2020-11-30 01:43

    Creating BufferedReader from Files.newBufferedReader

    Files.newBufferedReader(Paths.get("a.txt"), StandardCharsets.UTF_8);
    

    when running the application it may throw the following exception:

    java.nio.charset.MalformedInputException: Input length = 1
    

    But

    new BufferedReader(new InputStreamReader(new FileInputStream("a.txt"),"utf-8"));
    

    works well.

    The different is that, the former uses CharsetDecoder default action.

    The default action for malformed-input and unmappable-character errors is to report them.

    while the latter uses the REPLACE action.

    cs.newDecoder().onMalformedInput(CodingErrorAction.REPLACE).onUnmappableCharacter(CodingErrorAction.REPLACE)
    
    0 讨论(0)
  • 2020-11-30 01:43

    you can try something like this, or just copy and past below piece.

    boolean exception = true;
    Charset charset = Charset.defaultCharset(); //Try the default one first.        
    int index = 0;
    
    while(exception) {
        try {
            lines = Files.readAllLines(f.toPath(),charset);
              for (String line: lines) {
                  line= line.trim();
                  if(line.contains(keyword))
                      values.add(line);
                  }           
            //No exception, just returns
            exception = false; 
        } catch (IOException e) {
            exception = true;
            //Try the next charset
            if(index<Charset.availableCharsets().values().size())
                charset = (Charset) Charset.availableCharsets().values().toArray()[index];
            index ++;
        }
    }
    
    0 讨论(0)
  • 2020-11-30 01:44

    I wrote the following to print a list of results to standard out based on available charsets. Note that it also tells you what line fails from a 0 based line number in case you are troubleshooting what character is causing issues.

    public static void testCharset(String fileName) {
        SortedMap<String, Charset> charsets = Charset.availableCharsets();
        for (String k : charsets.keySet()) {
            int line = 0;
            boolean success = true;
            try (BufferedReader b = Files.newBufferedReader(Paths.get(fileName),charsets.get(k))) {
                while (b.ready()) {
                    b.readLine();
                    line++;
                }
            } catch (IOException e) {
                success = false;
                System.out.println(k+" failed on line "+line);
            }
            if (success) 
                System.out.println("*************************  Successs "+k);
        }
    }
    
    0 讨论(0)
提交回复
热议问题