I have a custom stream that is used to perform write operations directly into the page cloud blob.
public sealed class WindowsAzureCloudPageBlobStream : Stream
{
If you don't mind working out of a file instead of a stream (or perhaps this has stream support and I don't know about it), look at the Azure Storage Data Movement Library. It's the best I've seen so far.
It's relatively new (at the time of writing) but has very good support for moving large files in chunks and maximizing throughput (I use it for nightly copying of SQL backups, many exceeding 1GB in size).
https://azure.microsoft.com/en-us/blog/announcing-azure-storage-data-movement-library-0-2-0/
Usage is quite easy. Here's an example:
using Microsoft.WindowsAzure.Storage;
using Microsoft.WindowsAzure.Storage.Blob;
using Microsoft.WindowsAzure.Storage.DataMovement;
namespace BlobUploader
{
public class Uploader
{
public string ConnectionString { get; set; }
public string ContainerName { get; set; }
public string BlobName { get; set; }
public void UploadFile(string filePath) {
CloudStorageAccount account = CloudStorageAccount.Parse(ConnectionString);
CloudBlobClient blobClient = account.CreateCloudBlobClient();
CloudBlobContainer blobContainer = blobClient.GetContainerReference(ContainerName);
blobContainer.CreateIfNotExists();
CloudBlockBlob destinationBlob = blobContainer.GetBlockBlobReference(BlobName);
TransferManager.Configurations.ParallelOperations = 64;
TransferContext context = new TransferContext();
context.ProgressHandler = new Progress<TransferProgress>((progress) => {
Console.WriteLine("Bytes uploaded: {0}", progress.BytesTransferred);
});
var task = TransferManager.UploadAsync(filePath, destinationBlob, null, context, CancellationToken.None);
task.Wait();
}
}
}
The following preview blog post gives some information on how this came about and how it approaches things in general:
https://azure.microsoft.com/en-us/blog/introducing-azure-storage-data-movement-library-preview-2/
One simple and quick thing to check: make sure your blob storage is in the same Azure region where your VM or Application is running. One issue that we ran into was our storage account was in another region from our application. This caused us a significant delay during processing. We were scratching our heads until we realized we were reading and writing across regions. Rookie mistake on our part!
Like you, I had a lot of performance issues with page blobs as well - even though they were not this severe. It seems like you've done your homework, and I can see that you're doing everything by the book.
A few things to check:
ServicePointManager.DefaultConnectionLimit
. Task
s / async
/ await
, especially if you have a lot to do).Oh and one more thing:
The main reason you're access times are slow is because you're doing everything synchronously. The benchmarks at microsoft access the blobs in multiple threads, which will give more throughput.
Now, Azure also knows that performance is an issue, which is why they've attempted to mitigate the problem by backing storage with local caching. What basically happens here is that they write the data local (f.ex. in a file), then cut the tasks into pieces and then use multiple threads to write everything to blob storage. The Data Storage Movement library is one such libraries. However, when using them you should always keep in mind that these have different durability constraints (it's like enabling 'write caching' on your local PC) and might break the way you intended to setup your distributed system (if you read & write the same storage from multiple VM's).
Why...
You've asked for the 'why'. In order to understand why blob storage is slow, you need to understand how it works. First I'd like to point out that there is this presentation from Microsoft Azure that explains how Azure storage actually works.
First thing that you should realize is that Azure storage is backed by a distributed set of (spinning) disks. Because of the durability and consistency constraints, they also ensure that there's a 'majority vote' that the data is written to stable storage. For performance, several levels of the system will have caches, which will mostly be read caches (again, due to the durability constraints).
Now, the Azure team doesn't publish everything. Fortunately for me, 5 years ago my previous company created a similar system on a smaller scale. We had similar performance problems like Azure, and the system was quite similar to the presentation that I've linked above. As such, I think I can explain and speculate a bit on where the bottlenecks are. For clarity I'll mark sections as speculation where I think this is appropriate.
If you write a page to blob storage, you actually setup a series of TCP/IP connections, store the page at multiple locations, and when a majority vote is received you give an 'ok' back to the client. Now, there are actually a few bottlenecks in this system:
Number (1), (2) and (3) here are quite well known. Number (4) here is actually the result of (1) and (2). Note that you cannot just throw an infinite number of requests to spinning disks; well... actually you can, but then the system will come to a grinding halt. So, in order to solve that, disk seeks from different clients are usually scheduled in such a way that you only seek if you know that you can also write everything (to minimize the expensive seeks). However, there's an issue here: if you want to push throughput, you need to start seeking before you have all the data - and if you're not getting the data fast enough, other requests have to wait longer. Herein also lies a dilemma: you can either optimize for this (this can sometimes hurt per-client throughput and stall everyone else, especially with mixed workloads) or buffer everything and then seek & write everything at once (this is easier, but adds some latency for everyone). Because of the vast amount of clients that Azure serves, I suspect they chose the last approach - which adds more latency to a complete write cycle.
Regardless of that, most of the time will probably be spent by (1) and (2) though. The actual data bursts and data writes are then quite fast. To give you a rough estimation: here are some commonly used timings.
So, that leaves us with 1 question: why is writing stuff in multiple threads so much faster?
The reason for that is actually very simple: if we write stuff in multiple threads, there's a high chance that we store the actual data on different servers. This means that we can shift our bottleneck from "seek + network setup latency" to "throughput". And as long as our client VM can handle it, it's very likely that the infrastructure can handle it as well.