html-agility-pack

Unexpected behaviour while using Httpwebrequest on a form to obtain a table for scrapping

回眸只為那壹抹淺笑 提交于 2019-12-13 03:57:37
问题 I am trying to scrape a website written in php to extract some information from a particular table. Here is the scenario. On the landing page there is a form that can take queries from user and based on that search for the results. If I ignore those fields and click on "Submit" it will produce the whole result (Which is what I am interested in). Before I did not know about HTTPWebRequest class and I was simply passing the URL to Htmlweb.load(URL) method in HtmlAgilityPack library and

Parsing Radio button name only bringing up one value

和自甴很熟 提交于 2019-12-13 03:02:11
问题 I have written some code to parse the name from some radio buttons. <div id="First" class="Size-Inputs"> <input rel="8" type="radio" value="13051374" name="idProduct-13051359"/> <span rel="L">L</span> <input rel="8" type="radio" value="13051373" name="idProduct-13051359"/> <span rel="M">M</span> <input rel="8" type="radio" value="13051372" name="idProduct-13051359"/> <span rel="S">S</span> <input rel="8" type="radio" value="13051375"name="idProduct-13051359"/> <span rel="XL">XL</span> </div>

Xpath for selecting html id including random number

[亡魂溺海] 提交于 2019-12-13 02:34:21
问题 Hi how would I select all link when they have the following id. <a id="List_ctl01_link3" class="content" href=link1.aspx"> <a id="List_ctl02_link3" class="content" href=link2.aspx"> <a id="List_ctl03_link3" class="content" href=link3.aspx"> <a id="List_ctl04_link3" class="content" href=link4.aspx"> And so on... Please note that the last part "link3" is important, and must be included in the Xpath. I'm using C# and Html agility pack. 回答1: In case you use xpath 2.0 you can try match/matches

htmlagilitypack: Find second table within a div

别来无恙 提交于 2019-12-13 01:16:49
问题 I'm trying to parse information from a div that has 3 tables within it. I can get information from the first one without problem. Code so far as follow: HtmlAgilityPack.HtmlWeb doc = new HtmlAgilityPack.HtmlWeb(); HtmlAgilityPack.HtmlDocument htmldocObject = doc.Load(URL); var res = htmldocObject.DocumentNode.SelectSingleNode("//div[@class='BoxContent']"); var firstTable = res.SelectSingleNode("//table"); var charName = firstTable.ChildNodes[i++].InnerText.Substring(5).Trim(); <div class=

Wrap in an element with HtmlAgilityPack?

会有一股神秘感。 提交于 2019-12-13 00:32:01
问题 I have an HtmlDocument that may or may have a proper <head> and <body> section or might just be an html fragment. Either way, I want to run it through a function that will ensure that it has (more) proper html structure. I know that I can check if it has a body by seeing if doc.DocumentNode.SelectSingleNode("//body"); is null. If it does not have a body, how would I wrap the contents of doc.DocumentNode in a <body> element and assign it back to the HtmlDocument ? Edit: There seems to be some

Parse inner HTML

大城市里の小女人 提交于 2019-12-12 21:46:27
问题 This is what I want to parse <div class="photoBox pB-ms"> <a href="/user_details?userid=ePDZ9HuMGWR7vs3kLfj3Gg"> <img width="100" height="100" alt="Photo of Debbie K." src="http://s3-media2.px.yelpcdn.com/photo/xZab5rpdueTCJJuUiBlauA/ms.jpg"> </a> </div> I am using following XPath to find it HtmlNodeCollection bodyNode = htmlDoc.DocumentNode.SelectNodes("//div[@class='photoBox pB-ms']"); This is fine and return,s me all div,s with photobox class But when I want to get ahref using

Gettig Htmlelement based on HtmlAgilityPack.HtmlNode

泪湿孤枕 提交于 2019-12-12 19:53:14
问题 I use HtmlAgilityPack to parse the html document of a webbrowser control. I am able to find my desired HtmlNode, but after getting the HtmlNode, I want to retun the corresponding HtmlElement in the WebbrowserControl.Document. In fact HtmlAgilityPack parse an offline copy of the live document, while I want to access live elements of the webbrowser Control to access some rendered attributes like currentStyle or runtimeStyle HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();

Get option from specific start position?

五迷三道 提交于 2019-12-12 19:33:18
问题 I've a select like this: <select class="foo"> <option></option> <option>item1</option> <option>item2</option> </select> I need to get only the option that have text inside, so I need to skip the first option and get only Item1 and Item2 what I did var opts = doc.DocumentNode.SelectNodes("//select[@class='foo']//option"); this will return of course 3 options, how can I do this? Thanks. 回答1: Working xpath: "//select[@class='foo']//option[string-length( text()) > 0]" 回答2: This XPath might work

Parse image src with HTML Agilty Pack

烈酒焚心 提交于 2019-12-12 19:07:33
问题 Hi so i am trying to parse a webpage with HTML Agilty Pack to get the src of an image. This is the structure of the page. <div class="post_body"> <div style="text-align: center;"> <a href="http://www.engadget.com/2012/02/29/qualcomm-windows-8/"> <img src="http://www.blogcdn.com/www.engadget.com/media/2012/02/201202297192-1330536971.jpg" style="border-width: 0px; border-style: solid; margin: 4px;"> </a> </div> <div> Now I am using this code to attempt to get the src HtmlWeb hw = new HtmlWeb();

How to get count number of SelectedNode with XPath in C#?

♀尐吖头ヾ 提交于 2019-12-12 18:17:39
问题 I am using HTMLAgilityPack in my application, and i want to get the item(node) count of SelectedNodes as the code below: HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument(); doc.LoadHtml(webBrowser1.DocumentText); var tagListe = doc.DocumentNode.SelectNodes("//a[@href]"); var divListe = doc.DocumentNode.SelectNodes("//div[@class='o']"); At the first, getting a href was successfully running, but second one i prefer to get special class named "o" there was en error. I want to