Im trying to save a Spark DataFrame (of more than 20G) to a single json file in Amazon S3, my code to save the dataframe is like this :
dataframe.repartition(1)
s3a is not production version in Spark I think. I would say the design is not sound. repartition(1) is going to be terrible (what you are telling spark is to merge all partitions to a single one). I would suggest to convince the downstream to download contents from a folder rather than a single file