All inclusive Charset to avoid “java.nio.charset.MalformedInputException: Input length = 1”?

前端未结

关注

 10  1281

I\'m creating a simple wordcount program in Java that reads through a directory\'s text-based files.

However, I keep on getting the error:

java.nio.c


                      
              相关标签:


      
      
        
          10条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  长发绾君心        
                
              
                            
                2020-11-30 01:37
              
            
            
                                                                       
Well, the problem is that Files.newBufferedReader(Path path) is implemented like this :

public static BufferedReader newBufferedReader(Path path) throws IOException {
    return newBufferedReader(path, StandardCharsets.UTF_8);
}


so basically there is no point in specifying UTF-8 unless you want to be descriptive in your code. 
If you want to try a "broader" charset you could try with StandardCharsets.UTF_16, but you can't be 100% sure to get every possible character anyway.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  囚心锁ツ        
                
              
                            
                2020-11-30 01:43
              
            
            
                                                                       
Creating BufferedReader from Files.newBufferedReader 

Files.newBufferedReader(Paths.get("a.txt"), StandardCharsets.UTF_8);


when running the application it may throw the following exception:

java.nio.charset.MalformedInputException: Input length = 1


But

new BufferedReader(new InputStreamReader(new FileInputStream("a.txt"),"utf-8"));


works well.

The different is that, the former uses CharsetDecoder default action.


  The default action for malformed-input and unmappable-character errors is to report them.


while the latter uses the REPLACE action.

cs.newDecoder().onMalformedInput(CodingErrorAction.REPLACE).onUnmappableCharacter(CodingErrorAction.REPLACE)

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  执笔经年        
                
              
                            
                2020-11-30 01:43
              
            
            
                                                                       
you can try something like this, or just copy and past below piece.

boolean exception = true;
Charset charset = Charset.defaultCharset(); //Try the default one first.        
int index = 0;

while(exception) {
    try {
        lines = Files.readAllLines(f.toPath(),charset);
          for (String line: lines) {
              line= line.trim();
              if(line.contains(keyword))
                  values.add(line);
              }           
        //No exception, just returns
        exception = false; 
    } catch (IOException e) {
        exception = true;
        //Try the next charset
        if(index<Charset.availableCharsets().values().size())
            charset = (Charset) Charset.availableCharsets().values().toArray()[index];
        index ++;
    }
}

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  轮回少年        
                
              
                            
                2020-11-30 01:44
              
            
            
                                                                       
I wrote the following to print a list of results to standard out based on available charsets.  Note that it also tells you what line fails from a 0 based line number in case you are troubleshooting what character is causing issues.

public static void testCharset(String fileName) {
    SortedMap<String, Charset> charsets = Charset.availableCharsets();
    for (String k : charsets.keySet()) {
        int line = 0;
        boolean success = true;
        try (BufferedReader b = Files.newBufferedReader(Paths.get(fileName),charsets.get(k))) {
            while (b.ready()) {
                b.readLine();
                line++;
            }
        } catch (IOException e) {
            success = false;
            System.out.println(k+" failed on line "+line);
        }
        if (success) 
            System.out.println("*************************  Successs "+k);
    }
}

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
   
          
     上一页
1
2
           
           
        
                                  
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复