There are lots of such questions but nothing seems to help. I am trying to covert quite large csv.gz files to parquet and keep on getting various errors like
\'C
How many DPUs you are using. This article gives a nice overview of DPU capacity planning. Hope that helps. There is no definite rulebook from AWS stating how much DPU you need to process a particular size.
I think the problem isn't directly connected to the number of DPUs. You have large file and you are using GZIP format which it’s not splittable therefore you have this problem.
I suggest to convert your file from GZIP to bzip2 or lz4. Additionaly you should consider to use partitioning of output data for better performance in the future.
http://comphadoop.weebly.com/