问题
So I just started learning C# last night. The first project I started was a simple Image-Downloader, which downloads all images of a website using HtmlElementCollection.
Here's what I got so far:
private void dl_Click(object sender, EventArgs e)
{
System.Net.WebClient wClient = new System.Net.WebClient();
HtmlElementCollection hecImages = Browser.Document.GetElementsByTagName("img");
for (int i = 0; i < hecImages.Count - 1; i++)
{
char[] ftype = new char[4];
string gtype;
try
{
//filetype
hecImages[i].GetAttribute("src").CopyTo(hecImages[i].GetAttribute("src").Length -4,ftype,0,4) ;
gtype = new string(ftype);
//copy image to local path
wClient.DownloadFile(hecImages[i].GetAttribute("src"), absPath + i.ToString() + gtype);
}
catch (System.Net.WebException)
{
expand_Exception_Log();
System.Threading.Thread.Sleep(50);
}
Basically it's rendering the page in advance and looking for the images. This works pretty well, but for some reason it only downloads the Thumbnails, but not the full (high-res) image.
Additional Sources:
Documentation on WebClient.DownloadFile: http://msdn.microsoft.com/en-us/library/ez801hhe(v=vs.110).aspx
The DownloadFile method downloads to a local file data from the URI specified by in the address parameter.
回答1:
Take a gander at How can I use HTML Agility Pack to retrieve all the images from a website?
This uses a library called HTML Agility Pack
to download all <img src="" \>
lines on a website.
How can I use HTML Agility Pack to retrieve all the images from a website?
If that topic somehow disappears, I'm putting this up for those who need it but can't reach that topic.
// Creating a list array
public List<string> ImageList;
public void GetAllImages()
{
// Declaring 'x' as a new WebClient() method
WebClient x = new WebClient();
// Setting the URL, then downloading the data from the URL.
string source = x.DownloadString(@"http://www.google.com");
// Declaring 'document' as new HtmlAgilityPack() method
HtmlAgilityPack.HtmlDocument document = new HtmlAgilityPack.HtmlDocument();
// Loading document's source via HtmlAgilityPack
document.LoadHtml(source);
// For every tag in the HTML containing the node img.
foreach(var link in document.DocumentNode.Descendants("img")
.Select(i => i.Attributes["src"]))
{
// Storing all links found in an array.
// You can declare this however you want.
ImageList.Add(link.Attribute["src"].Value.ToString());
}
}
Since you are rather new as you stated, you can add HTML Agility Pack easily with NuGet.
To add it, you right-click
on your project, click Manage NuGet Packages
, search the Online tab on the left hand side for HTML Agility Pack
and click install. You need to call it by using using HtmlAgilityPack;
After all that you should be fine creating and using a method already created to download all items contained in the image_list
array created above.
Good luck!
EDIT: Added comments explaining what each section does.
EDIT2: Updated snippet to reflect user comment.
来源:https://stackoverflow.com/questions/26188948/download-all-images-of-a-website