html-agility-pack

Grabbing meta-tags and comments using HTML Agility Pack

别来无恙 提交于 2019-12-29 06:31:30
问题 I've looked for tutorials on using HTML Agility Pack as it seems to do everything I want it to do but it seems that for such a powerful tool there is little noise about it on the Internet. I am writing a simple method that will retrieve any given tag based on name: public string[] GetTagsByName(string TagName, string Source) { ... } This can be easily done using a Regular Expression but we all know that using the regex for parsing HTML isn't right. So far I have the following code: ... //

How can I use iText to convert HTML with images and hyperlinks to PDF?

佐手、 提交于 2019-12-29 00:47:13
问题 I'm trying to convert HTML to PDF using iTextSharp in an ASP.NET web application that uses both MVC, and web forms. The <img> and <a> elements have absolute and relative URLs, and some of the <img> elements are base64. Typical answers here at SO and Google search results use generic HTML to PDF code with XMLWorkerHelper that looks something like this: using (var stringReader = new StringReader(xHtml)) { using (Document document = new Document()) { PdfWriter writer = PdfWriter.GetInstance

Html Agility Pack - Problem selecting subnode

怎甘沉沦 提交于 2019-12-28 08:39:15
问题 I want to export my Asics running plan to iCal and since Asics do not offer this service, I decided to build a little scraper for my own personal use. What I want to do is to take all the scheduled runs from my plan and generate an iCal feed based on that. I am using C# and Html Agility Pack. What I want to do is iterate through all my scheduled runs (they are div nodes). Then next I want to select a few different nodes with my run nodes. My code looks like this: foreach (var run in doc

Html Agility Pack - Problem selecting subnode

自闭症网瘾萝莉.ら 提交于 2019-12-28 08:38:12
问题 I want to export my Asics running plan to iCal and since Asics do not offer this service, I decided to build a little scraper for my own personal use. What I want to do is to take all the scheduled runs from my plan and generate an iCal feed based on that. I am using C# and Html Agility Pack. What I want to do is iterate through all my scheduled runs (they are div nodes). Then next I want to select a few different nodes with my run nodes. My code looks like this: foreach (var run in doc

Grab all text from html with Html Agility Pack

こ雲淡風輕ζ 提交于 2019-12-27 17:38:01
问题 Input <html><body><p>foo <a href='http://www.example.com'>bar</a> baz</p></body></html> Output foo bar baz I know of htmldoc.DocumentNode.InnerText , but it will give foobarbaz - I want to get each text, not all at a time. 回答1: var root = doc.DocumentNode; var sb = new StringBuilder(); foreach (var node in root.DescendantNodesAndSelf()) { if (!node.HasChildNodes) { string text = node.InnerText; if (!string.IsNullOrEmpty(text)) sb.AppendLine(text.Trim()); } } This does what you need, but I am

Can i serialize HtmlAgilityPack.HtmlDocument [closed]

混江龙づ霸主 提交于 2019-12-25 19:35:13
问题 It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center. Closed 7 years ago . Server Error in '/' Application. Type 'HtmlAgilityPack.HtmlDocument' in Assembly 'HtmlAgilityPack, Version=1.4.0.0, Culture=neutral, PublicKeyToken=bd319b19eaf3b43a' is not marked as serializable. Description: An

HtmlAgilityPack query returning no value

浪子不回头ぞ 提交于 2019-12-25 11:09:21
问题 Been struggling for 2 days. I'm using C# and HtmlAgilityPack within a .NET 4.5 winforms project to extract data from a website (the field I want to extract is $ flow and B/S ratio). I get to the field (flow : /n/t/t/t; instead of flow 245 M) but I have no value. I have no idea why I get no value when I query while I see the value in the web page. Would like to see if someone else finds the reasons of nodes =null result of my query. This is the url of athe queried web page : http://finance

Foreach not iterating through elements

南楼画角 提交于 2019-12-25 09:14:20
问题 I have an HTML document and I'm getting elements based on a class. Once I have them, I'm going through each element and get further elements: var doc = new HtmlAgilityPack.HtmlDocument(); doc.LoadHtml(content); var rows = doc.DocumentNode.SelectNodes("//tr[contains(@class, 'row')]"); foreach (var row in rows) { var name = row.SelectSingleNode("//span[contains(@class, 'name')]").InnerText, var surname = row.SelectSingleNode("//span[contains(@class, 'surname')]").InnerText, customers.Add(new

Retrieve parts of text inside <li>

左心房为你撑大大i 提交于 2019-12-25 09:13:54
问题 I have HTML like this <li class="in-ttl-b">(a) kanji; a Chinese character [ideograph] <ul class="list-data-b-in"><li class="text-jejp text-c"><span class="ex">漢字で書く</span></li><li class="text-jeen text-c">write in <i>kanji</i> [<i>Chinese characters</i>]</li></ul> <ul class="list-data-b-in"><li class="text-jejp text-c"><span class="ex">常用漢字</span></li><li class="text-jeen text-c"><i>Chinese characters</i> for everyday use (in Japan)</li></ul> </li> How can I get only kanji; a Chinese

Selective screen scraping with HTMLAgilityPack and XPath

徘徊边缘 提交于 2019-12-25 03:44:31
问题 [This question has a relative that lives at: Screen scraping with htmlAgilityPack and XPath ] I have some HTML to parse which has general appearance as follow: ... <tr> <td><a href="" title="">Text Data here (1)</a></td> <td>Text Data here(2)</td> <td>Text Data here(3)</td> <td>Text Data here(4)</td> <td>Text Data here(5)</td> <td>Text Data here(6)</td> <td><a href="link here {1}" class="image"><img alt="" src="" /></a></td> </tr> <tr> <td><a href="" title="">Text Data here (1)</a></td> <td