How to stream a large gzipped .tsv file from s3, process it, and write back to a new file on s3?
问题 I have a large file s3://my-bucket/in.tsv.gz that I would like to load and process, write back its processed version to an s3 output file s3://my-bucket/out.tsv.gz . How do I streamline the in.tsv.gz directly from s3 without loading all the file to memory (it cannot fit the memory) How do I write the processed gzipped stream directly to s3? In the following code, I show how I was thinking to load the input gzipped dataframe from s3, and how I would write the .tsv if it were located locally