Killing spark streaming job when no activity

本秂侑毒 提交于 2021-01-29 13:40:30

问题


I want to kill my spark streaming job when there is no activity (i.e. the receivers are not receiving messages) for a certain time. I tried doing this

var counter = 0

myDStream.foreachRDD {
  rdd =>
    if (rdd.count() == 0L)
    {
      counter = counter + 1
      if (counter == 40) {
        ssc.stop(true, true)
      }
    } else {
      counter = 0
    }
}

Is there a better way of doing this? How would I make a variable available to all receivers and update the variable by 1 whenever there is no activity?


回答1:


Use a NoSQL Table like Cassandra or HBase to keep the counter. You can not handle Stream Polling inside a loop. Implement same logic using NoSQL or Maria DB and perform a Graceful Shutdown to your streaming Job if no activity is happening. The way I did it was I maintained a Table in Maria DB for Streaming JOB having Polling interval of 5 mins. Every 5 mins it hits the data base and writes the count of records it consumed also the method returns what is the count of zero records line items during latest timestamp. This helped me a lot managing my Streaming Job Management. Also this table usually helps me o automatically trigger the Streaming job based on a logic written in a shell script



来源:https://stackoverflow.com/questions/56521366/killing-spark-streaming-job-when-no-activity

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!