Hadoop DistributedCache functionality in Spark
问题 I am looking for a functionality similar to the distributed cache of Hadoop in Spark. I need a relatively small data file (with some index values) to be present in all nodes in order to make some calculations. Is there any approach that makes this possible in Spark? My workaround so far consists on distributing and reducing the index file as a normal processing, which takes around 10 seconds in my application. After that, I persist the file indicating it as a broadcast variable, as follows: