c# .net 4.5 async / multithread?

前端 未结 4 1011
礼貌的吻别
礼貌的吻别 2021-01-30 03:32

I\'m writing a C# console application that scrapes data from web pages.

This application will go to about 8000 web pages and scrape data(same format of data on each page

4条回答
  •  时光取名叫无心
    2021-01-30 03:53

    I recommend reading my reasonably-complete introduction to async/await.

    First, make everything asynchronous, starting at the lower-level stuff:

    public static async Task ScrapeDataAsync(string pageid)
    {
      CookieAwareWebClient webClient = ...;
      var dsPageData = new DataSet();
    
      // DOWNLOAD HTML FOR THE REO PAGE AND LOAD IT INTO AN HTMLDOCUMENT
      string url = @"https://domain.com?&id=" + pageid + @"restofurl";
      string html = await webClient.DownloadStringTaskAsync(url).ConfigureAwait(false);
      var doc = new HtmlDocument();
      doc.LoadHtml(html);
    
      // A BUNCH OF PARSING WITH HTMLAGILITY AND STORING IN dsPageData 
      return dsPageData;
    }
    

    Then you can consume it as follows (using async with LINQ):

    DataSet alldata;
    var tasks = the8000urls.Select(async url =>
    {
      var dataForOnePage = await ScrapeDataAsync(url);
    
      //merge each table in dataForOnePage into allData
    
    });
    await Task.WhenAll(tasks);
    PushAllDataToSql(alldata);
    

    And use AsyncContext from my AsyncEx library since this is a console app:

    class Program
    {
      static int Main(string[] args)
      {
        try
        {
          return AsyncContext.Run(() => MainAsync(args));
        }
        catch (Exception ex)
        {
          Console.Error.WriteLine(ex);
          return -1;
        }
      }
    
      static async Task MainAsync(string[] args)
      {
        ...
      }
    }
    

    That's it. No need for locking or continuations or any of that.

提交回复
热议问题