Grabbing meta-tags and comments using HTML Agility Pack

别来无恙 提交于 2019-12-29 06:31:30

问题


I've looked for tutorials on using HTML Agility Pack as it seems to do everything I want it to do but it seems that for such a powerful tool there is little noise about it on the Internet.

I am writing a simple method that will retrieve any given tag based on name:

public string[] GetTagsByName(string TagName, string Source) {
    ...
}

This can be easily done using a Regular Expression but we all know that using the regex for parsing HTML isn't right. So far I have the following code:

...
// TODO: Clear Comments (can this be done or should I use RegEx?)
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(Source);
ArrayList tags = new ArrayList();
string xpath = "//" + TagName;
foreach (HtmlTextNode node in doc.DocumentNode.SelectNodes(xpath) {
    tags.Add(node.Text);
}
return (string[])tags.ToArray(typeof(String));

I would like to be able to first strip all comments from the HTML, then return the correct tag based on its name. If possible I'd also like to return certain meta-tags based on attribute, such as robot. I'm not that great with xpath, so any help with that would be good.

Any help would be much appreciated.


回答1:


HtmlAgilityPack's HtmlDocument implements IXpathNavigable, thus it uses the standard .NET XPath engine. Any XPath 1.0 documentation will be applicable, especially if it talks about System.Xml.XPath.

"//comment()" finds all comments
"//meta" finds all "meta" elements

HtmlDocument was designed to look very much like XmlDocument, so examples and tutorials about it will be somewhat applicable.

Some MSDN links:

  • XPath Reference
  • Examples
  • XPath functions


来源:https://stackoverflow.com/questions/2354653/grabbing-meta-tags-and-comments-using-html-agility-pack

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!