html-agility-pack | 易学教程

Unexpected behaviour while using Httpwebrequest on a form to obtain a table for scrapping

阅读更多关于 Unexpected behaviour while using Httpwebrequest on a form to obtain a table for scrapping

问题 I am trying to scrape a website written in php to extract some information from a particular table. Here is the scenario. On the landing page there is a form that can take queries from user and based on that search for the results. If I ignore those fields and click on "Submit" it will produce the whole result (Which is what I am interested in). Before I did not know about HTTPWebRequest class and I was simply passing the URL to Htmlweb.load(URL) method in HtmlAgilityPack library and

Parsing Radio button name only bringing up one value

阅读更多关于 Parsing Radio button name only bringing up one value

问题 I have written some code to parse the name from some radio buttons. <div id="First" class="Size-Inputs"> <input rel="8" type="radio" value="13051374" name="idProduct-13051359"/> <span rel="L">L</span> <input rel="8" type="radio" value="13051373" name="idProduct-13051359"/> <span rel="M">M</span> <input rel="8" type="radio" value="13051372" name="idProduct-13051359"/> <span rel="S">S</span> <input rel="8" type="radio" value="13051375"name="idProduct-13051359"/> <span rel="XL">XL</span> </div>

Xpath for selecting html id including random number

阅读更多关于 Xpath for selecting html id including random number

问题 Hi how would I select all link when they have the following id. <a id="List_ctl01_link3" class="content" href=link1.aspx"> <a id="List_ctl02_link3" class="content" href=link2.aspx"> <a id="List_ctl03_link3" class="content" href=link3.aspx"> <a id="List_ctl04_link3" class="content" href=link4.aspx"> And so on... Please note that the last part "link3" is important, and must be included in the Xpath. I'm using C# and Html agility pack. 回答1: In case you use xpath 2.0 you can try match/matches

htmlagilitypack: Find second table within a div

阅读更多关于 htmlagilitypack: Find second table within a div

问题 I'm trying to parse information from a div that has 3 tables within it. I can get information from the first one without problem. Code so far as follow: HtmlAgilityPack.HtmlWeb doc = new HtmlAgilityPack.HtmlWeb(); HtmlAgilityPack.HtmlDocument htmldocObject = doc.Load(URL); var res = htmldocObject.DocumentNode.SelectSingleNode("//div[@class='BoxContent']"); var firstTable = res.SelectSingleNode("//table"); var charName = firstTable.ChildNodes[i++].InnerText.Substring(5).Trim(); <div class=

Wrap in an element with HtmlAgilityPack?

阅读更多关于 Wrap in an element with HtmlAgilityPack?

问题 I have an HtmlDocument that may or may have a proper <head> and <body> section or might just be an html fragment. Either way, I want to run it through a function that will ensure that it has (more) proper html structure. I know that I can check if it has a body by seeing if doc.DocumentNode.SelectSingleNode("//body"); is null. If it does not have a body, how would I wrap the contents of doc.DocumentNode in a <body> element and assign it back to the HtmlDocument ? Edit: There seems to be some

Parse inner HTML

阅读更多关于 Parse inner HTML

问题 This is what I want to parse <div class="photoBox pB-ms"> <a href="/user_details?userid=ePDZ9HuMGWR7vs3kLfj3Gg"> <img width="100" height="100" alt="Photo of Debbie K." src="http://s3-media2.px.yelpcdn.com/photo/xZab5rpdueTCJJuUiBlauA/ms.jpg"> </a> </div> I am using following XPath to find it HtmlNodeCollection bodyNode = htmlDoc.DocumentNode.SelectNodes("//div[@class='photoBox pB-ms']"); This is fine and return,s me all div,s with photobox class But when I want to get ahref using

Gettig Htmlelement based on HtmlAgilityPack.HtmlNode

阅读更多关于 Gettig Htmlelement based on HtmlAgilityPack.HtmlNode

问题 I use HtmlAgilityPack to parse the html document of a webbrowser control. I am able to find my desired HtmlNode, but after getting the HtmlNode, I want to retun the corresponding HtmlElement in the WebbrowserControl.Document. In fact HtmlAgilityPack parse an offline copy of the live document, while I want to access live elements of the webbrowser Control to access some rendered attributes like currentStyle or runtimeStyle HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();

Get option from specific start position?

阅读更多关于 Get option from specific start position?

问题 I've a select like this: <select class="foo"> <option></option> <option>item1</option> <option>item2</option> </select> I need to get only the option that have text inside, so I need to skip the first option and get only Item1 and Item2 what I did var opts = doc.DocumentNode.SelectNodes("//select[@class='foo']//option"); this will return of course 3 options, how can I do this? Thanks. 回答1: Working xpath: "//select[@class='foo']//option[string-length( text()) > 0]" 回答2: This XPath might work

Parse image src with HTML Agilty Pack

阅读更多关于 Parse image src with HTML Agilty Pack

问题 Hi so i am trying to parse a webpage with HTML Agilty Pack to get the src of an image. This is the structure of the page. <div class="post_body"> <div style="text-align: center;"> <a href="http://www.engadget.com/2012/02/29/qualcomm-windows-8/"> <img src="http://www.blogcdn.com/www.engadget.com/media/2012/02/201202297192-1330536971.jpg" style="border-width: 0px; border-style: solid; margin: 4px;"> </a> </div> <div> Now I am using this code to attempt to get the src HtmlWeb hw = new HtmlWeb();

How to get count number of SelectedNode with XPath in C#?

阅读更多关于 How to get count number of SelectedNode with XPath in C#?

问题 I am using HTMLAgilityPack in my application, and i want to get the item(node) count of SelectedNodes as the code below: HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument(); doc.LoadHtml(webBrowser1.DocumentText); var tagListe = doc.DocumentNode.SelectNodes("//a[@href]"); var divListe = doc.DocumentNode.SelectNodes("//div[@class='o']"); At the first, getting a href was successfully running, but second one i prefer to get special class named "o" there was en error. I want to