问题
Forgive my ignorance on the subject
I am using
string p="http://" + Textbox2.text;
string r= textBox3.Text;
System.Net.WebClient webclient=new
System.Net.Webclient();
webclient.DownloadFile(p,r);
to download a webpage. Can you please help me with enhancing the code so that it downloads the entire website. Tried using HTML Screen Scraping but it returns me only the href links of the index.html files. How do i proceed ahead
Thanks
回答1:
Scraping a website is actually a lot of work, with a lot of corner cases.
Invoke wget instead. The manual explains how to use the "recursive retrieval" options.
回答2:
protected string GetWebString(string url)
{
string appURL = url;
HttpWebRequest wrWebRequest = WebRequest.Create(appURL) as HttpWebRequest;
HttpWebResponse hwrWebResponse = (HttpWebResponse)wrWebRequest.GetResponse();
StreamReader srResponseReader = new StreamReader(hwrWebResponse.GetResponseStream());
string strResponseData = srResponseReader.ReadToEnd();
srResponseReader.Close();
return strResponseData;
}
This puts the webpage into a string from the supplied URL.
You can then use REGEX to parse through the string.
This little piece gets specific links out of craigslist and adds them to an arraylist...Modify to your purpose.
protected ArrayList GetListings(int pages)
{
ArrayList list = new ArrayList();
string page = GetWebString("http://albany.craigslist.org/bik/");
MatchCollection listingMatches = Regex.Matches(page, "(<p><a href=\")(?<LINK>/.+/.+[.]html)(\">)(?<TITLE>.*)(-</a>)");
foreach (Match m in listingMatches)
{
list.Add("http://albany.craigslist.org" + m.Groups["LINK"].Value.ToString());
}
return list;
}
来源:https://stackoverflow.com/questions/2091758/download-an-entire-website-in-c-sharp