How Spark Structured Streaming handles backpressure?

三世轮回 提交于 2020-01-12 22:46:15

问题


I'm analyzing the backpressure feature on Spark Structured Streaming. Does anyone know the details? Is it possible to tune process incoming records by code? Thanks


回答1:


If you mean dynamically changing the size of each internal batch in Structured Streaming, then NO. There are not receiver-based sources in Structured Streaming, so that's totally not necessary. From another point of view, Structured Streaming cannot do real backpressure, because, such as, Spark cannot tell other applications to slow down the speed of pushing data into Kafka.

Generally, Structured Streaming will try to process data as fast as possible by default. There are options in each source to allow to control the processing rate, such as maxFilesPerTrigger in File source, and maxOffsetsPerTrigger in Kafka source. Read the following links for more details:

http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#input-sources http://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html




回答2:


Handling back pressure is needed only is push based mechanisms. Kafka consumers are pull based, spark will pull next batch of records only when current batch is finished processing and saving. If processing & saving is delayed in spark, it won't pull new batch of records so no need of back pressure handling. If you want to reduce number of records processed per spark batch set maxOffsetsPerTrigger, but that's not same as back pressure.



来源:https://stackoverflow.com/questions/44871621/how-spark-structured-streaming-handles-backpressure

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!