How to name file when saveAsTextFile in spark?

后端未结

关注

 3  1550

别跟我提以往 2021-02-20 17:45

When saving as a textfile in spark version 1.5.1 I use: rdd.saveAsTextFile(\'\').

But if I want to find the file in that direcotry, how d

3条回答

故里飘歌 (楼主)

2021-02-20 18:38

It's not possible to name the file as @nod said. However, it's possible to rename the file right afterward. An example using PySpark:

sc._jsc.hadoopConfiguration().set(
    "mapred.output.committer.class",
    "org.apache.hadoop.mapred.FileOutputCommitter")
URI = sc._gateway.jvm.java.net.URI
Path = sc._gateway.jvm.org.apache.hadoop.fs.Path
FileSystem = sc._gateway.jvm.org.apache.hadoop.fs.FileSystem
fs = FileSystem.get(URI("s3://{bucket_name}"), sc._jsc.hadoopConfiguration())
file_path = "s3://{bucket_name}/processed/source={source_name}/year={partition_year}/week={partition_week}/"
# remove data already stored if necessary
fs.delete(Path(file_path))

df.saveAsTextFile(file_path, compressionCodecClass="org.apache.hadoop.io.compress.GzipCodec")

# rename created file
created_file_path = fs.globStatus(Path(file_path + "part*.gz"))[0].getPath()
fs.rename(
    created_file_path,
    Path(file_path + "{desired_name}.jl.gz"))

0 讨论(0)

查看其它3个回答