When saving as a textfile in spark version 1.5.1 I use: rdd.saveAsTextFile(\'
.
But if I want to find the file in that direcotry, how d
It's not possible to name the file as @nod said. However, it's possible to rename the file right afterward. An example using PySpark:
sc._jsc.hadoopConfiguration().set(
"mapred.output.committer.class",
"org.apache.hadoop.mapred.FileOutputCommitter")
URI = sc._gateway.jvm.java.net.URI
Path = sc._gateway.jvm.org.apache.hadoop.fs.Path
FileSystem = sc._gateway.jvm.org.apache.hadoop.fs.FileSystem
fs = FileSystem.get(URI("s3://{bucket_name}"), sc._jsc.hadoopConfiguration())
file_path = "s3://{bucket_name}/processed/source={source_name}/year={partition_year}/week={partition_week}/"
# remove data already stored if necessary
fs.delete(Path(file_path))
df.saveAsTextFile(file_path, compressionCodecClass="org.apache.hadoop.io.compress.GzipCodec")
# rename created file
created_file_path = fs.globStatus(Path(file_path + "part*.gz"))[0].getPath()
fs.rename(
created_file_path,
Path(file_path + "{desired_name}.jl.gz"))