Is foreachRDD executed on the Driver?

前端未结

关注

 2  901

礼貌的吻别 2021-02-13 23:06

I am trying to process some XML data received on a JMS queue (QPID) using Spark Streaming. After getting xml as DStream I convert them to Dataframes so I can join them with some

2条回答

南方客 (楼主)

2021-02-14 00:06

so does that mean all processing logic will only run on Driver and not get distributed to workers/executors.

No, the function itself runs on the driver, but don't forget that it operates on an RDD. The inner functions that you'll use on the RDD, such as foreachPartition, map, filter etc will still run on the worker nodes. This won't cause all the data to be sent back over the network to the driver, unless you call methods like collect, which do.

0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...