Uploading 10,000,000 files to Azure blob storage from Linux

前端 未结 3 1731
别跟我提以往
别跟我提以往 2021-01-18 06:05

I have some experience with S3, and in the past have used s3-parallel-put to put many (millions) small files there. Compared to Azure, S3 has an expensive PUT p

3条回答
  •  挽巷
    挽巷 (楼主)
    2021-01-18 06:27

    If you prefer the commandline and have a recent Python interpreter, the Azure Batch and HPC team has released a code sample with some AzCopy-like functionality on Python called blobxfer. This allows full recursive directory ingress into Azure Storage as well as full container copy back out to local storage. [full disclosure: I'm a contributor for this code]

    To answer your questions:

    1. blobxfer supports rsync-like operations using MD5 checksum comparisons for both ingress and egress
    2. blobxfer performs concurrent operations, both within a single file and across multiple files. However, you may want to split up your input across multiple directories and containers which will not only help reduce memory usage in the script but also will partition your load better

提交回复
热议问题