I am working with AWS and I have workflows that use Spark and Hive. My data is partitioned by the date, so everyday I have a new partition in my S3 storage.
My problem is when
I would suggest to run sql using sparksession. you can run " insert overwrite partition query" by selecting the columns from existing dataset. this solution will surely overwrites partition only.