Batch Uploading Huge Sets of Images to Azure Blob Storage

前端 未结 6 1666
抹茶落季
抹茶落季 2021-02-06 01:08

I have about 110,000 images of various formats (jpg, png and gif) and sizes (2-40KB) stored locally on my hard drive. I need to upload them to Azure Blob Storage. While doing th

6条回答
  •  忘了有多久
    2021-02-06 01:45

    Okay, here's what I did. I tinkered around with running BeginUploadFromStream(), then BeginSetMetadata(), then BeginSetProperties() in an asynchronous chain, paralleled over 5-10 threads (a combination of ElvisLive's and knightpfhor's suggestions). This worked, but anything over 5 threads had terrible performance, taking upwards of 20 seconds for each thread (working on a page of ten images at a time) to complete.

    So, to sum up the performance differences:

    • Asynchronous: 5 threads, each running an async chain, each working on ten images at a time (paged for statistical reasons): ~15.8 seconds (per thread).
    • Synchronous: 1 thread, ten images at a time (paged for statistical reasons): ~3.4 seconds

    Okay, that's pretty interesting. One instance uploading blobs synchronously performed 5x better than each thread in the other approach. So, even running the best async balance of 5 threads nets essentially the same performance.

    So, I tweaked my image file importing to separate the images into folders containing 10,000 images each. Then I used Process.Start() to launch an instance of my blob uploader for each folder. I have 170,000 images to work with in this batch, so that means 17 instances of the uploader. When running all of those on my laptop, performance across all of them leveled out at ~4.3 seconds per set.

    Long story short, instead of trying to get threading working optimally, I just run a blob uploader instance for every 10,000 images, all on the one machine at the same time. Total performance boost?

    • Async Attempts: 14-16 hours, based on average execution time when running it for an hour or two.
    • Synchronous with 17 separate instances: ~1 hour, 5 minutes.

提交回复
热议问题