How to name file when saveAsTextFile in spark?

后端 未结 3 1524
别跟我提以往
别跟我提以往 2021-02-20 17:45

When saving as a textfile in spark version 1.5.1 I use: rdd.saveAsTextFile(\'\').

But if I want to find the file in that direcotry, how d

3条回答
  •  故里飘歌
    2021-02-20 18:38

    It's not possible to name the file as @nod said. However, it's possible to rename the file right afterward. An example using PySpark:

    sc._jsc.hadoopConfiguration().set(
        "mapred.output.committer.class",
        "org.apache.hadoop.mapred.FileOutputCommitter")
    URI = sc._gateway.jvm.java.net.URI
    Path = sc._gateway.jvm.org.apache.hadoop.fs.Path
    FileSystem = sc._gateway.jvm.org.apache.hadoop.fs.FileSystem
    fs = FileSystem.get(URI("s3://{bucket_name}"), sc._jsc.hadoopConfiguration())
    file_path = "s3://{bucket_name}/processed/source={source_name}/year={partition_year}/week={partition_week}/"
    # remove data already stored if necessary
    fs.delete(Path(file_path))
    
    df.saveAsTextFile(file_path, compressionCodecClass="org.apache.hadoop.io.compress.GzipCodec")
    
    # rename created file
    created_file_path = fs.globStatus(Path(file_path + "part*.gz"))[0].getPath()
    fs.rename(
        created_file_path,
        Path(file_path + "{desired_name}.jl.gz"))
    

提交回复
热议问题