问题
I want to extract Title, Description & images from URL using HTML Agility utility so far i am not able to find an example which is easy to understand & can help me to do it.
I would appreciate if some can help me with example so that i can extract title, description & give user choice to select image from series of image (some thing similar to Facebook when we share a link).
Updated:
I have place a label for title, desc and a button , textbox on the .aspx page & i fire following code on button click event. but it return null for all values. may be i am doing something wrong.
i used following sample URLhttp://edition.cnn.com/2012/10/31/world/asia/india/index.html?hpt=hp_t2
protected void btnGetURLDetails_Click(object sender, EventArgs e)
{
HtmlDocument doc = new HtmlDocument();
var response = txtURL.Text;
doc.LoadHtml(response);
String title = (from x in doc.DocumentNode.Descendants()
where x.Name.ToLower() == "title"
select x.InnerText).FirstOrDefault();
String desc = (from x in doc.DocumentNode.Descendants()
where x.Name.ToLower() == "description"
select x.InnerText).FirstOrDefault();
List<String> imgs = (from x in doc.DocumentNode.Descendants()
where x.Name.ToLower() == "img"
select x.Attributes["src"].Value).ToList<String>();
lblTitle.Text = title;
lblDescription.Text = desc;
}
Above code gets me null value for all variable
if i modify the code with this
HtmlDocument doc = new HtmlDocument();
var url = txtURL.Text;
var webGet = new HtmlWeb();
doc = webGet.Load(url);
in this case it only get me value for title & description is null again
回答1:
protected void btnGetURLDetails_Click(object sender, EventArgs e)
{
HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create(new Uri(txtURL.Text));
request.Method = WebRequestMethods.Http.Get;
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
StreamReader reader = new StreamReader(response.GetResponseStream());
String responseString = reader.ReadToEnd();
response.Close();
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(responseString);
String title = (from x in doc.DocumentNode.Descendants()
where x.Name.ToLower() == "title"
select x.InnerText).FirstOrDefault();
String desc = (from x in doc.DocumentNode.Descendants()
where x.Name.ToLower() == "meta"
&& x.Attributes["name"] != null
&& x.Attributes["name"].Value.ToLower() == "description"
select x.Attributes["content"].Value).FirstOrDefault();
List<String> imgs = (from x in doc.DocumentNode.Descendants()
where x.Name.ToLower() == "img"
select x.Attributes["src"].Value).ToList<String>();
lblTitle.Text = title;
lblDescription.Text = desc;
}
来源:https://stackoverflow.com/questions/13158091/how-to-extract-a-urls-title-images-and-description-using-html-agility-utility