Hadoop DistributedCache functionality in Spark

前端未结

关注

 2  1117

I am looking for a functionality similar to the distributed cache of Hadoop in Spark. I need a relatively small data file (with some index values) to be present in all nodes

相关标签:

2条回答

别跟我提以往

2021-01-12 15:53

As long as we use Broadcast variables, it should be effective with larger dataset as well.

From the Spark documentation "Broadcast variables allow the programmer to keep a read-only variable cached on each machine rather than shipping a copy of it with tasks. They can be used, for example, to give every node a copy of a large input dataset in an efficient manner."

0 讨论(0)
发布评论:

提交评论
- 加载中...
北荒

2021-01-12 15:55

Please have a look at SparkContext.addFile() method. Guess that is what you were looking for.

0 讨论(0)
发布评论:

提交评论
- 加载中...