html-agility-pack | 易学教程

Get a specific option in HtmlAgilityPack?

阅读更多关于 Get a specific option in HtmlAgilityPack?

问题 is possible get with HtmlAgilityPack a specific option? For example I've a select like this: <select id="foo"> <option value="0">1</option> <option value="1" selected="selected">2</option> </selected> I need to get the option with selected. I know how to get the option with: doc.DocumentNode.SelectNodes("//select[@id='foo']//option"); 回答1: This should work: doc.DocumentNode.SelectNodes("//select[@id='foo']/option[@selected='selected']"); You can read more about xpath here 回答2: doc

How to get html elements with multiple css classes

阅读更多关于 How to get html elements with multiple css classes

问题 I know how to get a list of DIVs of the same css class e.g <div class="class1">1</div> <div class="class1">2</div> using xpath //div[@class='class1'] But how if a div have multiple classes, e.g <div class="class1 class2">1</div> What will the xpath like then? 回答1: The expression you're looking for is: //div[contains(@class, 'class1') and contains(@class, 'class2')] I highly suggest XPath visualizer, which can help you debug xpath expressions easily. It can be found here: http:/

Trouble Scraping Web Page With Malformed Content

阅读更多关于 Trouble Scraping Web Page With Malformed Content

问题 I have written c# code which utilizes the HtmlAgilityPack library in order to scrape a page located at: World's Largest Urban Areas (Page 2). Unfortunately the page consists of malformed content. I'm at an impasse on how to scrape this page. The current code I have (appearing below) freezes on parsing the HTML: HtmlNodeCollection cityRecords = _htmlDocument.DocumentNode.SelectNodes("//table[@class='boldtable']//tr[position() != 1]"); CityNodes = (from node in cityRecords.Descendants() where

Trouble Scraping Web Page With Malformed Content

阅读更多关于 Trouble Scraping Web Page With Malformed Content

HtmlAgilityPack Select individual elements from a list of divs

阅读更多关于 HtmlAgilityPack Select individual elements from a list of divs

问题 I am trying to scrape using the HtmlAgilityPack child elements from a list of divs. The most parent Div is //div[@class='cell in-area-cell middle-cell'] and if I simply iterate through the list I can display all the child content from each parent fine. But I don't want to display all the content, I would like to pick certain div's, p's and a's from each of the children but the code below is only giving me a list of the first //a[@class='listing-name'] . It gives me the correct number of

html agility pack question in parsing

阅读更多关于 html agility pack question in parsing

问题 I have this simple string: string testString = "6/21 <span style='font-size: x-small; font-family: Arial'><span style='font-size: 10pt; font-family: Arial'>Just got 78th street</span></span>"; how do i use the html agility pack to parse out just the text. please note: there is a span nested inside another span. thanks, rod. 回答1: I think the InnertText property should give just the text - var testString = "6/21 <span style='font-size: x-small; font-family: Arial'><span style='font-size: 10pt;

HTML agility pack - removing unwanted tags without removing content?

阅读更多关于 HTML agility pack - removing unwanted tags without removing content?

问题 I've seen a few related questions out here, but they don’t exactly talk about the same problem I am facing. I want to use the HTML Agility Pack to remove unwanted tags from my HTML without losing the content within the tags. So for instance, in my scenario, I would like to preserve the tags " b ", " i " and " u ". And for an input like: <p>my paragraph <div>and my <b>div</b></div> are <i>italic</i> and <b>bold</b></p> The resulting HTML should be: my paragraph and my <b>div</b> are <i>italic<

Scraping using Html Agility Package

阅读更多关于 Scraping using Html Agility Package

问题 I am trying to scrape data from a news article using HtmlAgilityPackage the link is as follows http://www.ndtv.com/india-news/vyapam-scam-documents-show-chief-minister-shivraj-chouhan-delayed-probe-780528 I have written the following code below to extract all the comments in this articles but for some reason my variable aTags is returning null value Code: var getHtmlWeb = new HtmlWeb(); var document = getHtmlWeb.Load(txtinputurl.Text); var aTags = document.DocumentNode.SelectNodes("//div[

Xpath table changes as combobox changes too

阅读更多关于 Xpath table changes as combobox changes too

问题 I'm working on an application in C# that goes to a website and gets some content out of a table. It's working fine, but here is the problem: the table that I'm getting the content of changes as I select a different value in a combobox. The Xpath that I use always gets the table that is first shown on the website and I don't know how to get the other ones. I'm posting here everything I think is useful for you to help me. The webpage is: http://br.soccerway.com/national/brazil/serie-a/2012

Download all PDF files from crawled links

阅读更多关于 Download all PDF files from crawled links

问题 While running code it says that ProductListPage is null and after dropping an error does not proceed forward. Any ideas how to solve this issue? Wait until //div[@class='productContain padb6']//div[@class='large-4 medium-4 columns']/a is found or something else? Here is my current code: HtmlDocument htmlDoc = new HtmlWeb().Load("https://example.com/"); HtmlNodeCollection ProductListPage = htmlDoc.DocumentNode.SelectNodes("//div[@class='productContain padb6']//div[@class='large-4 medium-4