Azure Service Fabric InvokeWithRetryAsync huge overhead

问题

I'm currently working on a Service Fabric microservice which needs to have a high throughput.

I wondered why I'm not able to achieve more than 500 1KB messages per second on my workstation using loopback.

I removed all the business logic and attached a performance profiler, just to measure end to end performance.

It seems that ~96% of the time is spent resolving the Client and only ~2% doing the actual Http requests.

I'm invoking "Send" in a tight loop for the test:

private HttpCommunicationClientFactory factory = new HttpCommunicationClientFactory();

public async Task Send()
{
    var client = new ServicePartitionClient<HttpCommunicationClient>(
         factory,
         new Uri("fabric:/MyApp/MyService"));

    await client.InvokeWithRetryAsync(c => c.HttpClient.GetAsync(c.Url + "/test"));
}

Any ideas on this? According to the documentation the way I call the Services seems to be Service Fabric best practice.

UPDATE: Caching the ServicePartioningClient does improve the Performance, but using partioned services, I'm unable to cache the client, since I don't know the partition for a give PartitionKey.

UPDATE 2: I'm sorry that I didn't include full details in my initial question. We noticed the huge overhead of InvokeWithRetry when initially implementing a socket based communication.

You won't notice it that much if you are using http requests. A http request already takes ~1ms, so adding 0.5ms for the InvokeWithRetry isn't that noticable.

But if you use raw sockets which takes in our case ~ 0.005ms adding 0.5ms overhead for the InvokeWithRetry is immense!

Here is an http example, with InvokeAndRetry it takes 3x as long:

public async Task RunTest()
{
    var factory = new HttpCommunicationClientFactory();
    var uri = new Uri("fabric:/MyApp/MyService");
    var count = 10000;

    // Example 1: ~6000ms
    for (var i = 0; i < count; i++)
    {
        var pClient1 = new ServicePartitionClient<HttpCommunicationClient>(factory, uri, new ServicePartitionKey(1));
        await pClient1.InvokeWithRetryAsync(c => c.HttpClient.GetAsync(c.Url));
    }

    // Example 2: ~1800ms
    var pClient2 = new ServicePartitionClient<HttpCommunicationClient>(factory, uri, new ServicePartitionKey(1));
    HttpCommunicationClient resolvedClient = null;
    await pClient2.InvokeWithRetryAsync(
        c =>
        {
            resolvedClient = c;
            return Task.FromResult(true);
        });

    for (var i = 0; i < count; i++)
    {
        await resolvedClient.HttpClient.GetAsync(resolvedClient.Url);
    }
}

I'm aware that InvokeWithRetry adds some nice stuff I don't want to miss from the clients. But does it need to resolve the partitions on every call?

回答1:

I thought it would be nice to actually benchmark this and see what the difference actually was. I create a basic setup with a Stateful service that opens a HttpListener and a client that calls that service three different ways:

Creating a new client for each call and do all the calls in sequence

for (var i = 0; i < count; i++)
{
    var client = new ServicePartitionClient<HttpCommunicationClient>(_factory, _httpServiceUri, new ServicePartitionKey(1));
    var httpResponseMessage = await client.InvokeWithRetryAsync(c => c.HttpClient.GetAsync(c.Url + $"?index={id}"));
}

Create the client only once and reuse it for each call, in a sequence

var client = new ServicePartitionClient<HttpCommunicationClient>(_factory, _httpServiceUri, new ServicePartitionKey(1));
for (var i = 0; i < count; i++)
{
    var httpResponseMessage = await client.InvokeWithRetryAsync(c => c.HttpClient.GetAsync(c.Url + $"?index={id}"));
}

Create a new client for each call and run all the calls in parallell

var tasks = new List<Task>();
for (var i = 0; i < count; i++)
{
    tasks.Add(Task.Run(async () =>
    {
        var client = new ServicePartitionClient<HttpCommunicationClient>(_factory, _httpServiceUri, new ServicePartitionKey(1));
        var httpResponseMessage = await client.InvokeWithRetryAsync(c => c.HttpClient.GetAsync(c.Url + $"?index={id}"));
    }));
}
Task.WaitAll(tasks.ToArray());

I then ran the test for a number of counts to get a form of average:

Now, this should be taken for what it is, not a complete and comprehensive test in a controlled environment, there are a number of factors that will affect this performance, such as the cluster size, what the called service actually does (in this case nothing really) and the size and complexity of the payload (in this case a very short string).

In this test I also wanted to see how Fabric Transport behaved and the performance was similar to HTTP transport (honestly I had expected slightly better but that might not be visible in this trivial scenario).

It's worth noting that for the parallell execution of 10,000 calls the performance was significantly degraded. This is likely due to the fact that the service runs out of working memory. The effect of this might be that some of the client calls are faulted and retried (to be verified) after a delay. The way I measure the duration is the total time until all calls have completed. At the same time it should be noted that the test does not really allow the service to use more than one node since all the calls are routed to the same Partition.

To conclude, the performance effect of reusing the client is nominal, and for trivial calls HTTP performs similar to Fabric Transport.

来源：https://stackoverflow.com/questions/41790829/azure-service-fabric-invokewithretryasync-huge-overhead

标签

.net

azure

azure-service-fabric

service-fabric-stateful