问题
I need to download multiple files with gsutil and I notices that gsutil uses a lot of memory when downloading multiple files. (Around 1-2 GB ram when download three 2G files with 9 processes each). Is there a way to tune memory usage of gsutil? This is kind of important to me because I am running gsutil in GKE, and a container will get killed if use too much memory (more than limit)
Another issue: it seems like gsutil can not download files with the same name in a single command (one will overwrite the other?). So I am not using the -m option. Instead I am downloading each file with a single gsutil command: gsutil -o "GSUtil:parallel_thread_count=1" -o "GSUtil:sliced_object_download_component_size=250M" -o "GSUtil:sliced_object_download_max_components=9" -o "GSUtil:parallel_process_count=9" cp bucket/file desFile
回答1:
I did test download the 2GB file and changing -o "GSUtil:parallel_process_count=X" changes memory consumption on Debian and Ubuntu:
- 1 parallel process: 85MB
- 5 parallel processes: 125MB
- 10 parallel processes: 165MB
- 50 paraller processes: 310MB
If you have kernel panic issues on GKE using gsutil with CentOS container image, switching to Ubuntu image should help.
If the memory consumption is too high for 3 files simultaneous download, you can consider using only 1 or 2 downloads.
There are also known issues of high memory usage with GKE
来源:https://stackoverflow.com/questions/56797730/gsutil-uses-a-lot-of-memory-when-download-multiple-files-with-a-lot-of-processes