Uploading 10,000,000 files to Azure blob storage from Linux

前端 未结 3 1730
别跟我提以往
别跟我提以往 2021-01-18 06:05

I have some experience with S3, and in the past have used s3-parallel-put to put many (millions) small files there. Compared to Azure, S3 has an expensive PUT p

相关标签:
3条回答
  • 2021-01-18 06:12

    https://github.com/Azure/azure-sdk-tools-xplat is the source of azure-cli, so more details can be found there. You may want to open issues to the repo :)

    1. azure-cli doesn't support "sync" so far.
    2. -concurrenttaskcount is to support parallel upload within a single file, which will increase the upload speed a lot, but it doesn't support multiple files yet.
    0 讨论(0)
  • 2021-01-18 06:26

    In order to upload bulk files into the blob storage there is a tool provided by Microsoft Checkout the storage explorer allows you to do the required task..

    0 讨论(0)
  • 2021-01-18 06:27

    If you prefer the commandline and have a recent Python interpreter, the Azure Batch and HPC team has released a code sample with some AzCopy-like functionality on Python called blobxfer. This allows full recursive directory ingress into Azure Storage as well as full container copy back out to local storage. [full disclosure: I'm a contributor for this code]

    To answer your questions:

    1. blobxfer supports rsync-like operations using MD5 checksum comparisons for both ingress and egress
    2. blobxfer performs concurrent operations, both within a single file and across multiple files. However, you may want to split up your input across multiple directories and containers which will not only help reduce memory usage in the script but also will partition your load better
    0 讨论(0)
提交回复
热议问题