c# .net 4.5 async / multithread?

前端未结

关注

 4  1011

礼貌的吻别 2021-01-30 03:32

I\'m writing a C# console application that scrapes data from web pages.

This application will go to about 8000 web pages and scrape data(same format of data on each page

4条回答

时光取名叫无心 (楼主)

2021-01-30 03:53

I recommend reading my reasonably-complete introduction to async/await.

First, make everything asynchronous, starting at the lower-level stuff:

public static async Task ScrapeDataAsync(string pageid)
{
  CookieAwareWebClient webClient = ...;
  var dsPageData = new DataSet();

  // DOWNLOAD HTML FOR THE REO PAGE AND LOAD IT INTO AN HTMLDOCUMENT
  string url = @"https://domain.com?&id=" + pageid + @"restofurl";
  string html = await webClient.DownloadStringTaskAsync(url).ConfigureAwait(false);
  var doc = new HtmlDocument();
  doc.LoadHtml(html);

  // A BUNCH OF PARSING WITH HTMLAGILITY AND STORING IN dsPageData 
  return dsPageData;
}

Then you can consume it as follows (using async with LINQ):

DataSet alldata;
var tasks = the8000urls.Select(async url =>
{
  var dataForOnePage = await ScrapeDataAsync(url);

  //merge each table in dataForOnePage into allData

});
await Task.WhenAll(tasks);
PushAllDataToSql(alldata);

And use AsyncContext from my AsyncEx library since this is a console app:

class Program
{
  static int Main(string[] args)
  {
    try
    {
      return AsyncContext.Run(() => MainAsync(args));
    }
    catch (Exception ex)
    {
      Console.Error.WriteLine(ex);
      return -1;
    }
  }

  static async Task MainAsync(string[] args)
  {
    ...
  }
}

That's it. No need for locking or continuations or any of that.

0 讨论(0)

查看其它4个回答