How to skip invalid characters in stream in Java/Scala?

前端未结

关注

 4  1665

For example I have following code

Source.fromFile(new File( path), \"UTF-8\").getLines()

and it throws exception

Exception in


                      
              相关标签:


      
      
        
          4条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  情歌与酒        
                
              
                            
                2021-02-02 13:12
              
            
            
                                                                       
I had a similar issue, and one of Scala's built-in codecs did the trick for me:

Source.fromFile(new File(path))(Codec.ISO8859).getLines()

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  礼貌的吻别        
                
              
                            
                2021-02-02 13:16
              
            
            
                                                                       
If you want to avoid invalid characters using Scala, I found this worked for me.

import java.nio.charset.CodingErrorAction
import scala.io._

object HelloWorld {

  def main(args: Array[String]) = {
    implicit val codec = Codec("UTF-8")

    codec.onMalformedInput(CodingErrorAction.REPLACE)
    codec.onUnmappableCharacter(CodingErrorAction.REPLACE)

    val dataSource = Source.fromURL("https://www.foo.com")

    for (line <- dataSource.getLines) {

      println(line)
    }
  }
}

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  旧时难觅i        
                
              
                            
                2021-02-02 13:20
              
            
            
                                                                       
Well, if it isn't UTF-8, it is something else. The trick is finding out what that something else is, but if all you want is avoid the errors, you can use an encoding that doesn't have invalid codes, such as latin1:

Source.fromFile(new File( path), "latin1").getLines()

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  忘掉有多难        
                
              
                            
                2021-02-02 13:23
              
            
            
                                                                       
You can influence the way that the charset decoding handles invalid input by calling CharsetDecoder.onMalformedInput.

Usually you won't ever see a CharsetDecoder object directly, because it will be created behind the scenes for you. So if you need access to it, you'll need to use API that allows you to specify the CharsetDecoder directly (instead of just the encoding name or the Charset).

The most basic example of such API is the InputStreamReader:

InputStream in = ...;
CharsetDecoder decoder = StandardCharsets.UTF_8.newDecoder();
decoder.onMalformedInput(CodingErrorAction.IGNORE);
Reader reader = new InputStreamReader(in, decoder);


Note that this code uses the Java 7 class StandardCharsets, for earlier versions you can simply replace it with Charset.forName("UTF-8") (or use the Charsets class from Guava).
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复