Is foreachRDD executed on the Driver?

前端 未结 2 895
礼貌的吻别
礼貌的吻别 2021-02-13 23:06

I am trying to process some XML data received on a JMS queue (QPID) using Spark Streaming. After getting xml as DStream I convert them to Dataframes so I can join them with some

2条回答
  •  南方客
    南方客 (楼主)
    2021-02-14 00:06

    so does that mean all processing logic will only run on Driver and not get distributed to workers/executors.

    No, the function itself runs on the driver, but don't forget that it operates on an RDD. The inner functions that you'll use on the RDD, such as foreachPartition, map, filter etc will still run on the worker nodes. This won't cause all the data to be sent back over the network to the driver, unless you call methods like collect, which do.

提交回复
热议问题