html-agility-pack | 易学教程

Grabbing meta-tags and comments using HTML Agility Pack

阅读更多关于 Grabbing meta-tags and comments using HTML Agility Pack

问题 I've looked for tutorials on using HTML Agility Pack as it seems to do everything I want it to do but it seems that for such a powerful tool there is little noise about it on the Internet. I am writing a simple method that will retrieve any given tag based on name: public string[] GetTagsByName(string TagName, string Source) { ... } This can be easily done using a Regular Expression but we all know that using the regex for parsing HTML isn't right. So far I have the following code: ... //

How can I use iText to convert HTML with images and hyperlinks to PDF?

阅读更多关于 How can I use iText to convert HTML with images and hyperlinks to PDF?

问题 I'm trying to convert HTML to PDF using iTextSharp in an ASP.NET web application that uses both MVC, and web forms. The <img> and <a> elements have absolute and relative URLs, and some of the <img> elements are base64. Typical answers here at SO and Google search results use generic HTML to PDF code with XMLWorkerHelper that looks something like this: using (var stringReader = new StringReader(xHtml)) { using (Document document = new Document()) { PdfWriter writer = PdfWriter.GetInstance

Html Agility Pack - Problem selecting subnode

阅读更多关于 Html Agility Pack - Problem selecting subnode

问题 I want to export my Asics running plan to iCal and since Asics do not offer this service, I decided to build a little scraper for my own personal use. What I want to do is to take all the scheduled runs from my plan and generate an iCal feed based on that. I am using C# and Html Agility Pack. What I want to do is iterate through all my scheduled runs (they are div nodes). Then next I want to select a few different nodes with my run nodes. My code looks like this: foreach (var run in doc

Html Agility Pack - Problem selecting subnode

阅读更多关于 Html Agility Pack - Problem selecting subnode

Grab all text from html with Html Agility Pack

阅读更多关于 Grab all text from html with Html Agility Pack

问题 Input <html><body>foo <a href='http://www.example.com'>bar</a> baz</body></html> Output foo bar baz I know of htmldoc.DocumentNode.InnerText , but it will give foobarbaz - I want to get each text, not all at a time. 回答1: var root = doc.DocumentNode; var sb = new StringBuilder(); foreach (var node in root.DescendantNodesAndSelf()) { if (!node.HasChildNodes) { string text = node.InnerText; if (!string.IsNullOrEmpty(text)) sb.AppendLine(text.Trim()); } } This does what you need, but I am

Can i serialize HtmlAgilityPack.HtmlDocument [closed]

阅读更多关于 Can i serialize HtmlAgilityPack.HtmlDocument [closed]

问题 It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center. Closed 7 years ago . Server Error in '/' Application. Type 'HtmlAgilityPack.HtmlDocument' in Assembly 'HtmlAgilityPack, Version=1.4.0.0, Culture=neutral, PublicKeyToken=bd319b19eaf3b43a' is not marked as serializable. Description: An

HtmlAgilityPack query returning no value

阅读更多关于 HtmlAgilityPack query returning no value

问题 Been struggling for 2 days. I'm using C# and HtmlAgilityPack within a .NET 4.5 winforms project to extract data from a website (the field I want to extract is $ flow and B/S ratio). I get to the field (flow : /n/t/t/t; instead of flow 245 M) but I have no value. I have no idea why I get no value when I query while I see the value in the web page. Would like to see if someone else finds the reasons of nodes =null result of my query. This is the url of athe queried web page : http://finance

Foreach not iterating through elements

阅读更多关于 Foreach not iterating through elements

问题 I have an HTML document and I'm getting elements based on a class. Once I have them, I'm going through each element and get further elements: var doc = new HtmlAgilityPack.HtmlDocument(); doc.LoadHtml(content); var rows = doc.DocumentNode.SelectNodes("//tr[contains(@class, 'row')]"); foreach (var row in rows) { var name = row.SelectSingleNode("//span[contains(@class, 'name')]").InnerText, var surname = row.SelectSingleNode("//span[contains(@class, 'surname')]").InnerText, customers.Add(new

Retrieve parts of text inside <li>

阅读更多关于 Retrieve parts of text inside

问题 I have HTML like this <li class="in-ttl-b">(a) kanji; a Chinese character [ideograph] <ul class="list-data-b-in"><li class="text-jejp text-c">漢字で書く</li><li class="text-jeen text-c">write in kanji [Chinese characters]</li></ul> <ul class="list-data-b-in"><li class="text-jejp text-c">常用漢字</li><li class="text-jeen text-c">Chinese characters for everyday use (in Japan)</li></ul> </li> How can I get only kanji; a Chinese

Selective screen scraping with HTMLAgilityPack and XPath

阅读更多关于 Selective screen scraping with HTMLAgilityPack and XPath

问题 [This question has a relative that lives at: Screen scraping with htmlAgilityPack and XPath ] I have some HTML to parse which has general appearance as follow: ... <tr> <td><a href="" title="">Text Data here (1)</a></td> <td>Text Data here(2)</td> <td>Text Data here(3)</td> <td>Text Data here(4)</td> <td>Text Data here(5)</td> <td>Text Data here(6)</td> <td><a href="link here {1}" class="image"><img alt="" src="" /></a></td> </tr> <tr> <td><a href="" title="">Text Data here (1)</a></td> <td