How to wait for Azure Search to finish indexing document? For integration testing purpose

问题

Scenario

I'm building a suite of automated integration tests. Each test push data into the Azure Search index before querying it and verifying the expected results.

Problem

The indexation happens asynchronously in the service and data aren't immediatly available after the indexing call returns successfully.
The test execute of course too rapidly most of the time.

What I've tried

I've tried querying the document until it's found:

// Wait for the indexed document to become available
while (await _index.Documents.SearchAsync("DocumentId").Results.SingleOrDefault() == null) { }

But oddly enough, a search query just behind won't generally find anything:

// Works 10% of the time, even after the above loop
await _index.Documents.SearchAsync(query.Text);

Using an arbitrary pause works, but it's not guaranteed and I'd like the tests to execute as fast as possible.

Thread.Sleep(3000);

Azure Search documentation:

Finally, the code in the example above delays for two seconds. Indexing happens asynchronously in your Azure Search service, so the sample application needs to wait a short time to ensure that the documents are available for searching. Delays like this are typically only necessary in demos, tests, and sample applications.

Aren't there any solution without scarifying tests performance?

回答1:

If your service has multiple search units, there is no way to determine when a document has been fully indexed. This is a deliberate decision to favor increased indexing/query performance over strong consistency guarantees.

If you're running tests against a single unit search service, the approach (keep checking for document existence with a query rather than a lookup) should work.

Note that on a free tier search service this will not work as it's hosted on multiple shared resources and does not count as a single unit. You'll see the same brief inconsistency that you would with a dedicated multi-unit service

Otherwise, one possible improvement would be to use retries along with a smaller sleep time.

回答2:

The other answer by @HeatherNakama was very helpful. I want to add to it, but first a paraphrased summary:

There is no way to reliably know a document is ready to be searched on all replicas, so the only way a spin-lock waiting until a document is found could work is to use a single-replica search service. (Note: the free tier search service is not single-replica, and you can't do anything about that.)

With that in mind, I've created a sample repository with Azure Search integration tests that roughly works like this:

private readonly ISearchIndexClient _searchIndexClient;

private void WaitForIndexing(string id)
{
    // For the free tier, or a service with multiple replicas, resort to this:
    // Thread.Sleep(2000);

    var wait = 25;

    while (wait <= 2000)
    {
        Thread.Sleep(wait);
        var result = fixture.SearchService.FilterForId(id);
        if (result.Result.Results.Count == 1) return;
        if (result.Result.Results.Count > 1) throw new Exception("Unexpected results");
        wait *= 2;
    }

    throw new Exception("Found nothing after waiting a while");
}

public async Task<DocumentSearchResult<PersonDto>> FilterForId(string id)
{
    if (string.IsNullOrWhiteSpace(id) || !Guid.TryParse(id, out var _))
    {
        throw new ArgumentException("Can only filter for guid-like strings", nameof(id));
    }

    var parameters = new SearchParameters
    {
        Top = 2, // We expect only one, but return max 2 so we can double check for errors
        Skip = 0,
        Facets = new string[] { },
        HighlightFields = new string[] { },
        Filter = $"id eq '{id}'",
        OrderBy = new[] { "search.score() desc", "registeredAtUtc desc" },
    };

    var result = await _searchIndexClient.Documents.SearchAsync<PersonDto>("*", parameters);

    if (result.Results.Count > 1)
    {
        throw new Exception($"Search filtering for id '{id}' unexpectedly returned more than 1 result. Are you sure you searched for an ID, and that it is unique?");
    }

    return result;
}

This might be used like this:

[SerializePropertyNamesAsCamelCase]
public class PersonDto
{
    [Key] [IsFilterable] [IsSearchable]
    public string Id { get; set; } = Guid.NewGuid().ToString();

    [IsSortable] [IsSearchable]
    public string Email { get; set; }

    [IsSortable]
    public DateTimeOffset? RegisteredAtUtc { get; set; }
}

[Theory]
[InlineData(0)]
[InlineData(1)]
[InlineData(2)]
[InlineData(3)]
[InlineData(5)]
[InlineData(10)]
public async Task Can_index_and_then_find_person_many_times_in_a_row(int count)
{
    await fixture.SearchService.RecreateIndex();

    for (int i = 0; i < count; i++)
    {
        var guid = Guid.NewGuid().ToString().Replace("-", "");
        var dto = new PersonDto { Email = $"{guid}@example.org" };
        await fixture.SearchService.IndexAsync(dto);

        WaitForIndexing(dto);

        var searchResult = await fixture.SearchService.Search(dto.Id);

        Assert.Single(searchResult.Results, p => p.Document.Id == dto.Id);
    }
}

I have tested and confirmed that this reliably stays green on a Basic tier search service with 1 replica, and intermittently becomes red on the free tier.

回答3:

Use a FluentWaitDriver or similar component to wait in tests, if waiting is needed only for tests. I wouldn't pollute the app with thread delays. Azure indexer will have a few acceptable milliseconds-seconds delay, provided the nature of your search instance.

来源：https://stackoverflow.com/questions/40108369/how-to-wait-for-azure-search-to-finish-indexing-document-for-integration-testin

标签

.net

asynchronous

azure-cognitive-search

azure-search-.net-sdk