问题
When I write my dataframe to S3 using
df.write
.format("parquet")
.mode("overwrite")
.partitionBy("year", "month", "day", "hour", "gen", "client")
.option("compression", "gzip")
.save("s3://xxxx/yyyy")
I get the following in S3
year=2018
year=2019
but I would like to have this instead:
year=2018
year=2018_$folder$
year=2019
year=2019_$folder$
The scripts that are reading from that S3 location depend on the *_$folder$
entries, but I haven't found a way to configure spark/hadoop to generate them.
Any idea on what hadoop or spark configuration setting control the generation of *_$folder$
files?
回答1:
those markers a legacy feature; I don't think anything creates them any more...though they are often ignored when actually listing directories. (that is, even if there, they get stripped from listings and replaced with directory entries).
来源:https://stackoverflow.com/questions/55693083/how-can-i-configure-spark-so-that-it-creates-folder-entries-in-s3