download all Images of a Website

问题

So I just started learning C# last night. The first project I started was a simple Image-Downloader, which downloads all images of a website using HtmlElementCollection.

Here's what I got so far:

    private void dl_Click(object sender, EventArgs e)
    {
        System.Net.WebClient wClient = new System.Net.WebClient();

        HtmlElementCollection hecImages = Browser.Document.GetElementsByTagName("img");

        for (int i = 0; i < hecImages.Count - 1; i++)
        {

            char[] ftype = new char[4];
            string gtype;

            try
            {
                //filetype
                hecImages[i].GetAttribute("src").CopyTo(hecImages[i].GetAttribute("src").Length -4,ftype,0,4) ;
                gtype = new string(ftype);

                //copy image to local path
                wClient.DownloadFile(hecImages[i].GetAttribute("src"), absPath + i.ToString() + gtype);                                                                               
            }
            catch (System.Net.WebException) 
            {
                expand_Exception_Log();
                System.Threading.Thread.Sleep(50);
            }

Basically it's rendering the page in advance and looking for the images. This works pretty well, but for some reason it only downloads the Thumbnails, but not the full (high-res) image.

Additional Sources:

Documentation on WebClient.DownloadFile: http://msdn.microsoft.com/en-us/library/ez801hhe(v=vs.110).aspx

The DownloadFile method downloads to a local file data from the URI specified by in the address parameter.

回答1:

Take a gander at How can I use HTML Agility Pack to retrieve all the images from a website?

This uses a library called HTML Agility Pack to download all <img src="" \> lines on a website. How can I use HTML Agility Pack to retrieve all the images from a website?

If that topic somehow disappears, I'm putting this up for those who need it but can't reach that topic.

// Creating a list array
public List<string> ImageList; 
public void GetAllImages()
{
    // Declaring 'x' as a new WebClient() method
    WebClient x = new WebClient();

    // Setting the URL, then downloading the data from the URL.
    string source = x.DownloadString(@"http://www.google.com");

    // Declaring 'document' as new HtmlAgilityPack() method
    HtmlAgilityPack.HtmlDocument document = new HtmlAgilityPack.HtmlDocument();

    // Loading document's source via HtmlAgilityPack
    document.LoadHtml(source);

    // For every tag in the HTML containing the node img.
    foreach(var link in document.DocumentNode.Descendants("img")
                                .Select(i => i.Attributes["src"])) 
    {
        // Storing all links found in an array.
        // You can declare this however you want.
        ImageList.Add(link.Attribute["src"].Value.ToString());
    }
}

Since you are rather new as you stated, you can add HTML Agility Pack easily with NuGet. To add it, you right-click on your project, click Manage NuGet Packages, search the Online tab on the left hand side for HTML Agility Pack and click install. You need to call it by using using HtmlAgilityPack;

After all that you should be fine creating and using a method already created to download all items contained in the image_list array created above.

Good luck!

EDIT: Added comments explaining what each section does.

EDIT2: Updated snippet to reflect user comment.

来源：https://stackoverflow.com/questions/26188948/download-all-images-of-a-website

标签

webclient