c# find image in html and download them

后端 未结 4 1982
予麋鹿
予麋鹿 2020-12-30 18:21

i want download all images stored in html(web page) , i dont know how much image will be download , and i don`t want use \"HTML AGILITY PACK\"

i search in google bu

相关标签:
4条回答
  • 2020-12-30 18:41

    In general terms

    1. You need to fetch the html page
    2. Search for img tags and extract the src="..." portion out of them
    3. Keep a list of all these extracted image urls.
    4. Download them one by one.

    Maybe this question about C# HTML parser will help you a little bit more.

    0 讨论(0)
  • 2020-12-30 18:42

    You can use a WebBrowser control and extract the HTML from that e.g.

    System.Windows.Forms.WebBrowser objWebBrowser = new System.Windows.Forms.WebBrowser();
    objWebBrowser.Navigate(new Uri("your url of html document"));
    System.Windows.Forms.HtmlDocument objDoc = objWebBrowser.Document;
    System.Windows.Forms.HtmlElementCollection aColl = objDoc.All.GetElementsByName("IMG");
    ...
    

    or directly invoke the IHTMLDocument family of COM interfaces

    0 讨论(0)
  • 2020-12-30 18:44

    First of all I just can't leave this phrase alone:

    images stored in html

    That phrase is probably a big part of the reason your question was down-voted twice. Images are not stored in html. Html pages have references to images that web browsers download separately.

    This means you need to do this in three steps: first download the html, then find the image references inside the html, and finally use those references to download the images themselves.

    To accomplish this, look at the System.Net.WebClient() class. It has a .DownloadString() method you can use to get the html. Then you need to find all the <img /> tags. You're own your own here, but it's straightforward enough. Finally, you use WebClient's .DownloadData() or DownloadFile() methods to retrieve the images.

    0 讨论(0)
  • 2020-12-30 18:55

    People are giving you the right answer - you can't be picky and lazy, too. ;-)

    If you use a half-baked solution, you'll deal with a lot of edge cases. Here's a working sample that gets all links in an HTML document using HTML Agility Pack (it's included in the HTML Agility Pack download).

    And here's a blog post that shows how to grab all images in an HTML document with HTML Agility Pack and LINQ

        // Bing Image Result for Cat, First Page
        string url = "http://www.bing.com/images/search?q=cat&go=&form=QB&qs=n";
    
        // For speed of dev, I use a WebClient
        WebClient client = new WebClient();
        string html = client.DownloadString(url);
    
        // Load the Html into the agility pack
        HtmlDocument doc = new HtmlDocument();
        doc.LoadHtml(html);
    
        // Now, using LINQ to get all Images
        List<HtmlNode> imageNodes = null;
        imageNodes = (from HtmlNode node in doc.DocumentNode.SelectNodes("//img")
                      where node.Name == "img"
                      && node.Attributes["class"] != null
                      && node.Attributes["class"].Value.StartsWith("img_")
                      select node).ToList();
    
        foreach(HtmlNode node in imageNodes)
        {
            Console.WriteLine(node.Attributes["src"].Value);
        }
    
    0 讨论(0)
提交回复
热议问题