Send parallel requests but only one per host with HttpClient and Polly to gracefully handle 429 responses

空扰寡人 提交于 2020-12-31 04:31:08

问题


Intro:

I am building a single-node web crawler to simply validate URLs are 200 OK in a .NET Core console application. I have a collection of URLs at different hosts to which I am sending requests with HttpClient. I am fairly new to using Polly and TPL Dataflow.

Requirements:

  1. I want to support sending multiple HTTP requests in parallel with a configurable MaxDegreeOfParallelism.
  2. I want to limit the number of parallel requests to any given host to 1 (or configurable). This is in order to gracefully handle per-host 429 TooManyRequests responses with a Polly policy. Alternatively, I could maybe use a Circuit Breaker to cancel concurrent requests to the same host on receipt of one 429 response and then proceed one-at-a-time to that specific host?
  3. I am perfectly fine with not using TPL Dataflow at all in favor of maybe using a Polly Bulkhead or some other mechanism for throttled parallel requests, but I am not sure what that configuration would look like in order to implement requirement #2.

Current Implementation:

My current implementation works, except that I often see that I'll have x parallel requests to the same host return 429 at about the same time... Then, they all pause for the retry policy... Then, they all slam the same host again at the same time often still receiving 429s. Even if I distribute multiple instances of the same host evenly throughout the queue, my URL collection is overweighted with a few specific hosts that still start generating 429s eventually.

After receiving a 429, I think I only want to send one concurrent request to that host going forward to respect the remote host and pursue 200s.

Validator Method:

public async Task<int> GetValidCount(IEnumerable<Uri> urls, CancellationToken cancellationToken)
{
    var validator = new TransformBlock<Uri, bool>(
        async u => (await _httpClient.GetAsync(u, HttpCompletionOption.ResponseHeadersRead, cancellationToken)).IsSuccessStatusCode,
        new ExecutionDataflowBlockOptions {MaxDegreeOfParallelism = MaxDegreeOfParallelism}
    );
    foreach (var url in urls)
        await validator.SendAsync(url, cancellationToken);
    validator.Complete();
    var validUrlCount = 0;
    while (await validator.OutputAvailableAsync(cancellationToken))
    {
        if(await validator.ReceiveAsync(cancellationToken))
            validUrlCount++;
    }
    await validator.Completion;
    return validUrlCount;
}

The Polly policy applied to the HttpClient instance used in GetValidCount() above.

IAsyncPolicy<HttpResponseMessage> waitAndRetryTooManyRequests = Policy
    .HandleResult<HttpResponseMessage>(r => r.StatusCode == HttpStatusCode.TooManyRequests)
    .WaitAndRetryAsync(3,
        (retryCount, response, context) =>
            response.Result?.Headers.RetryAfter.Delta ?? TimeSpan.FromMilliseconds(120),
        async (response, timespan, retryCount, context) =>
        {
            // log stuff
        });

Question:

How can I modify or replace this solution to add satisfaction of requirement #2?


回答1:


I'd try to introduce some sort of a flag LimitedMode to detect that this particular client is entered in limited mode. Below I declare two policies - one simple retry policy just to catch TooManyRequests and set the flag. The second policy is a out-of-the-box BulkHead policy.

    public void ConfigureServices(IServiceCollection services)
    {
        /* other configuration */

        var registry = services.AddPolicyRegistry();

        var catchPolicy = Policy.HandleResult<HttpResponseMessage>(r =>
            {
                LimitedMode = r.StatusCode == HttpStatusCode.TooManyRequests;
                return false;
            })
            .WaitAndRetryAsync(1, i => TimeSpan.FromSeconds(3)); 

        var bulkHead = Policy.BulkheadAsync<HttpResponseMessage>(1, 10, OnBulkheadRejectedAsync);

        registry.Add("catchPolicy", catchPolicy);
        registry.Add("bulkHead", bulkHead);

        services.AddHttpClient<CrapyWeatherApiClient>((client) =>
        {
            client.BaseAddress = new Uri("hosturl");
        }).AddPolicyHandlerFromRegistry(PolicySelector);
    }

Then you may want to dynamically decide on which policy to apply using the PolicySelector mechanism: in case the limited mode is active - wrap bulk head policy with catch 429 policy. If the success status code received - switch back to regular mode without a bulkhead.

    private IAsyncPolicy<HttpResponseMessage> PolicySelector(IReadOnlyPolicyRegistry<string> registry, HttpRequestMessage request)
    {
        var catchPolicy = registry.Get<IAsyncPolicy<HttpResponseMessage>>("catchPolicy");
        var bulkHead = registry.Get<IAsyncPolicy<HttpResponseMessage>>("bulkHead");
        if (LimitedMode)
        {
            return catchPolicy.WrapAsync(bulkHead);
        }

        return catchPolicy;
    }        



回答2:


Here is a method that creates a TransformBlock which prevents concurrent execution for messages with the same key. The key of each message is obtained by invoking the supplied keySelector function. Messages with the same key are processed sequentially to each other (not in parallel). The key is also passed as an argument to the transform function, because it can be useful in some cases.

public static TransformBlock<TInput, TOutput>
    CreateExclusivePerKeyTransformBlock<TInput, TKey, TOutput>(
    Func<TInput, TKey, Task<TOutput>> transform,
    ExecutionDataflowBlockOptions dataflowBlockOptions,
    Func<TInput, TKey> keySelector,
    IEqualityComparer<TKey> keyComparer = null)
{
    if (transform == null) throw new ArgumentNullException(nameof(transform));
    if (keySelector == null) throw new ArgumentNullException(nameof(keySelector));
    if (dataflowBlockOptions == null)
        throw new ArgumentNullException(nameof(dataflowBlockOptions));
    keyComparer = keyComparer ?? EqualityComparer<TKey>.Default;

    var internalCTS = CancellationTokenSource
        .CreateLinkedTokenSource(dataflowBlockOptions.CancellationToken);

    var maxDOP = dataflowBlockOptions.MaxDegreeOfParallelism;
    var taskScheduler = dataflowBlockOptions.TaskScheduler;

    var maxDopSemaphore
        = new SemaphoreSlim(maxDOP == -1 ? Int32.MaxValue : maxDOP);

    var perKeySemaphores = new ConcurrentDictionary<TKey, SemaphoreSlim>(
        keyComparer);

    // The degree of parallelism is controlled by the semaphores
    dataflowBlockOptions.MaxDegreeOfParallelism = DataflowBlockOptions.Unbounded;

    // An exclusive scheduler is needed for preserving the processing order
    dataflowBlockOptions.TaskScheduler =
        new ConcurrentExclusiveSchedulerPair().ExclusiveScheduler;

    var block = new TransformBlock<TInput, TOutput>(async item =>
    {
        var key = keySelector(item);
        var perKeySemaphore = perKeySemaphores
            .GetOrAdd(key, _ => new SemaphoreSlim(1));
        await perKeySemaphore.WaitAsync(internalCTS.Token).ConfigureAwait(false);
        try
        {
            await maxDopSemaphore.WaitAsync(internalCTS.Token)
                .ConfigureAwait(false);
            try
            {
                // Invoke the transform using the provided TaskScheduler
                return await Task.Factory.StartNew(() => transform(item, key),
                    internalCTS.Token, TaskCreationOptions.DenyChildAttach,
                    taskScheduler).Unwrap().ConfigureAwait(false);
            }
            catch (Exception ex) when (!(ex is OperationCanceledException))
            {
                internalCTS.Cancel(); // The block has failed
                throw;
            }
            finally
            {
                maxDopSemaphore.Release();
            }
        }
        finally
        {
            perKeySemaphore.Release();
        }
    }, dataflowBlockOptions);

    _ = block.Completion.ContinueWith(_ => internalCTS.Dispose(),
        TaskScheduler.Default);

    dataflowBlockOptions.MaxDegreeOfParallelism = maxDOP; // Restore initial value
    dataflowBlockOptions.TaskScheduler = taskScheduler; // Restore initial value
    return block;
}

Usage example:

var validator = CreateExclusivePerKeyTransformBlock<Uri, string, bool>(
    async (uri, host) =>
    {
        return (await _httpClient.GetAsync(uri, HttpCompletionOption
            .ResponseHeadersRead, token)).IsSuccessStatusCode;
    },
    new ExecutionDataflowBlockOptions
    {
        MaxDegreeOfParallelism = 30,
        CancellationToken = token,
    },
    keySelector: uri => uri.Host,
    keyComparer: StringComparer.OrdinalIgnoreCase);

All execution options are supported (MaxDegreeOfParallelism, BoundedCapacity, CancellationToken, EnsureOrdered etc).

Below is an overload of the CreateExclusivePerKeyTransformBlock that accepts a synchronous delegate, and another method+overload that returns an ActionBlock instead of a TransformBlock, with the same behavior.

public static TransformBlock<TInput, TOutput>
    CreateExclusivePerKeyTransformBlock<TInput, TKey, TOutput>(
    Func<TInput, TKey, TOutput> transform,
    ExecutionDataflowBlockOptions dataflowBlockOptions,
    Func<TInput, TKey> keySelector,
    IEqualityComparer<TKey> keyComparer = null)
{
    if (transform == null) throw new ArgumentNullException(nameof(transform));
    return CreateExclusivePerKeyTransformBlock(
        (item, key) => Task.FromResult(transform(item, key)),
        dataflowBlockOptions, keySelector, keyComparer);
}

// An ITargetBlock is similar to an ActionBlock
public static ITargetBlock<TInput>
    CreateExclusivePerKeyActionBlock<TInput, TKey>(
    Func<TInput, TKey, Task> action,
    ExecutionDataflowBlockOptions dataflowBlockOptions,
    Func<TInput, TKey> keySelector,
    IEqualityComparer<TKey> keyComparer = null)
{
    if (action == null) throw new ArgumentNullException(nameof(action));
    var block = CreateExclusivePerKeyTransformBlock(async (item, key) =>
        { await action(item, key).ConfigureAwait(false); return (object)null; },
        dataflowBlockOptions, keySelector, keyComparer);
    block.LinkTo(DataflowBlock.NullTarget<object>());
    return block;
}

public static ITargetBlock<TInput>
    CreateExclusivePerKeyActionBlock<TInput, TKey>(
    Action<TInput, TKey> action,
    ExecutionDataflowBlockOptions dataflowBlockOptions,
    Func<TInput, TKey> keySelector,
    IEqualityComparer<TKey> keyComparer = null)
{
    if (action == null) throw new ArgumentNullException(nameof(action));
    return CreateExclusivePerKeyActionBlock(
        (item, key) => { action(item, key); return Task.CompletedTask; },
        dataflowBlockOptions, keySelector, keyComparer);
}

Caution: This class allocates one SemaphoreSlim per key, and keeps a reference to it until the class instance is finally garbage collected. This could be an issue in case the number of different keys is huge. There is an implementation of a less allocatey async lock here, that stores internally only the SemaphoreSlims that are currently in use (plus a small pool of released SemaphoreSlims that can be reused), which could replace the ConcurrentDictionary<TKey, SemaphoreSlim> used by this implementation.



来源:https://stackoverflow.com/questions/57022754/send-parallel-requests-but-only-one-per-host-with-httpclient-and-polly-to-gracef

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!