Spark Streaming - obtain batch-level performance stats

邮差的信 提交于 2019-12-23 01:05:01

问题


I'm setting up an Apache Spark cluster to perform realtime streaming computations and would like to monitor the performance of the deployment by tracking various metrics like sizes of batches, batch processing times, etc. My Spark Streaming program is written in Scala

Questions

  1. The Spark monitoring REST API description lists the various endpoints available. However, I couldn't find endpoints that expose batch-level info. Is there a way to get a list of all the Spark batches that have been run for an application and other per-batch details such as follows:
    • Number of events per batch
    • Processing time
    • Scheduling delay
    • Exit status: ie, whether the batch was processed successfully or not
  2. In case such a batch-level API is unavailable, can batch-level stats (eg: size, processing time, scheduling delay, etc.) be obtained by adding custom instrumentation to the spark streaming program.

Thanks in advance,


回答1:


If you have no luck with 1., this will help with 2.:

ssc.addStreamingListener(new JobListener());

// ...

class JobListener implements StreamingListener {

    @Override
    public void onBatchCompleted(StreamingListenerBatchCompleted batchCompleted) {

        System.out.println("Batch completed, Total delay :" + batchCompleted.batchInfo().totalDelay().get().toString() +  " ms");

    }

   /*

   snipped other methods

   */


}

Taken from In Spark Streaming, is there a way to detect when a batch has finished?

batchCompleted.batchInfo() contains:

  • numRecords
  • batchTime, processsingStartTime, processingEndTime
  • schedulingDelay
  • outputOperationInfos

Hopefully you can get what you need from those properties.



来源:https://stackoverflow.com/questions/43335095/spark-streaming-obtain-batch-level-performance-stats

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!