The company I work for runs a few hundred very dynamic web sites. It has decided to build a search engine and I was tasked with writing the scraper. Some of the sites run on old
TPL Dataflow
and async-await
are indeed powerful and simple enough to be able to just what you need:
async Task> GetAllStringsAsync(IEnumerable urls)
{
var client = new HttpClient();
var bag = new ConcurrentBag();
var block = new ActionBlock(
async url => bag.Add(await client.GetStringAsync(url)),
new ExecutionDataflowBlockOptions {MaxDegreeOfParallelism = 5});
foreach (var url in urls)
{
block.Post(url);
}
block.Complete();
await block.Completion;
return bag;
}