Spark Streaming: Could not compute split, block not found

前端未结

关注

 3  406

I am trying to use Spark Streaming with Kafka (version 1.1.0) but the Spark job keeps crashing due to this error:

14/11/21 12:39:23 ERROR TaskSetManager: Tas


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  长发绾君心        
                
              
                            
                2021-01-02 00:23
              
            
            
                                                                       
have you tried with inputs.persist(StorageLevel.MEMORY_AND_DISK_SER).

E.g. http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Could-not-compute-split-block-not-found-td11186.html
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  滥情空心        
                
              
                            
                2021-01-02 00:24
              
            
            
                                                                       
It is due to Spark streaming model. It collects the data for a batch interval and sends it off for processing to spark engine. Spark engine is not aware that it is coming from a streaming system and it doesn't communicate it back to streaming component. 

This means there is no flow control (backpressure control) unlike in native streaming systems like Storm or Flink which can nicely smoothen the spout/source flow based on processing rate.

From https://spark.apache.org/docs/latest/streaming-programming-guide.html


One option to work around this would be to manually pass in processing info/Ack back to Receiver component - of course this also means we need to use a custom receiver. At this point we are starting to build features Storm/Flink etc. are providing out of the box.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  萌比男神i        
                
              
                            
                2021-01-02 00:30
              
            
            
                                                                       
Check the following.

1) Did you create the streaming context properly as in 

def functionToCreateContext(): StreamingContext = {
    val ssc = new StreamingContext(...)   // new context
    val lines = ssc.socketTextStream(...) // create DStreams
    ...
    ssc.checkpoint(checkpointDirectory)   // set checkpoint directory
    ssc
}

// Get StreamingContext from checkpoint data or create a new one
val context = StreamingContext.getOrCreate(checkpointDirectory, functionToCreateContext _)

// Do additional setup on context that needs to be done,
// irrespective of whether it is being started or restarted
context. ...

// Start the context
context.start()
context.awaitTermination()


Your initialization is incorrect.

Have a look at the below

Eg : code at recoverableNetworkCount App

2) Have you enabled the property write ahead log "spark.streaming.receiver.writeAheadLog.enable"

3) Check the stability of streaming in the Streaming UI. 
processing time < batch interval.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复