Where does spark look for text files?

后端 未结 2 1410
执念已碎
执念已碎 2021-02-08 09:29

I thought that loading text files is done only from workers / within the cluster (you just need to make sure all workers have access to the same path, either by having that text

相关标签:
2条回答
  • 2021-02-08 09:48

    So the really short version of it the answer is, if you reference "file://..." it should be accessible on all nodes in your cluster including the dirver program. Sometimes some bits of work happen on the worker. Generally the way around this is just not using local files, and instead using something like S3, HDFS, or another network filesystem. There is the sc.addFile method which can be used to distribute a file from the driver to all of the other nodes (and then you use SparkFiles.get to resolve the download location).

    0 讨论(0)
  • Spark can look for files both locally or on HDFS.

    If you'd like to read in a file using sc.textFile() and take advantage of its RDD format, then the file should sit on HDFS. If you just want to read in a file the normal way, it is the same as you do depending on the API (Scala, Java, Python).

    If you submit a local file with your driver, then addFile() distributes the file to each node and SparkFiles.get() downloads the file to a local temporary file.

    0 讨论(0)
提交回复
热议问题