c# find image in html and download them

后端未结

关注

 4  1982

i want download all images stored in html(web page) , i dont know how much image will be download , and i don`t want use \"HTML AGILITY PACK\"

i search in google bu

相关标签:

4条回答

孤街浪徒

2020-12-30 18:41
In general terms
1. You need to fetch the html page
2. Search for img tags and extract the src="..." portion out of them
3. Keep a list of all these extracted image urls.
4. Download them one by one.
Maybe this question about C# HTML parser will help you a little bit more.
0 讨论(0)
发布评论:

提交评论
- 加载中...

执念已碎

2020-12-30 18:42

You can use a WebBrowser control and extract the HTML from that e.g.

System.Windows.Forms.WebBrowser objWebBrowser = new System.Windows.Forms.WebBrowser();
objWebBrowser.Navigate(new Uri("your url of html document"));
System.Windows.Forms.HtmlDocument objDoc = objWebBrowser.Document;
System.Windows.Forms.HtmlElementCollection aColl = objDoc.All.GetElementsByName("IMG");
...

or directly invoke the IHTMLDocument family of COM interfaces

0 讨论(0)

不知归路

2020-12-30 18:44

First of all I just can't leave this phrase alone:

images stored in html

That phrase is probably a big part of the reason your question was down-voted twice. Images are not stored in html. Html pages have references to images that web browsers download separately.

This means you need to do this in three steps: first download the html, then find the image references inside the html, and finally use those references to download the images themselves.

To accomplish this, look at the System.Net.WebClient() class. It has a .DownloadString() method you can use to get the html. Then you need to find all the <img /> tags. You're own your own here, but it's straightforward enough. Finally, you use WebClient's .DownloadData() or DownloadFile() methods to retrieve the images.

0 讨论(0)
发布评论:

提交评论
- 加载中...

慢半拍i

2020-12-30 18:55

People are giving you the right answer - you can't be picky and lazy, too. ;-)

If you use a half-baked solution, you'll deal with a lot of edge cases. Here's a working sample that gets all links in an HTML document using HTML Agility Pack (it's included in the HTML Agility Pack download).

And here's a blog post that shows how to grab all images in an HTML document with HTML Agility Pack and LINQ

    // Bing Image Result for Cat, First Page
    string url = "http://www.bing.com/images/search?q=cat&go=&form=QB&qs=n";

    // For speed of dev, I use a WebClient
    WebClient client = new WebClient();
    string html = client.DownloadString(url);

    // Load the Html into the agility pack
    HtmlDocument doc = new HtmlDocument();
    doc.LoadHtml(html);

    // Now, using LINQ to get all Images
    List<HtmlNode> imageNodes = null;
    imageNodes = (from HtmlNode node in doc.DocumentNode.SelectNodes("//img")
                  where node.Name == "img"
                  && node.Attributes["class"] != null
                  && node.Attributes["class"].Value.StartsWith("img_")
                  select node).ToList();

    foreach(HtmlNode node in imageNodes)
    {
        Console.WriteLine(node.Attributes["src"].Value);
    }

0 讨论(0)