html-agility-pack

XPath Query Problem using HTML Agility Pack

末鹿安然 提交于 2020-01-04 13:30:14
问题 I'm trying to scrape the price field from this website using the HTML Agility Pack. My code is as follows; var web = new HtmlWeb(); var doc = web.Load(String.Format(overClockersURL, componentID)); var priceContent = doc.DocumentNode.SelectSingleNode("//*[@id=\"prodprice\"]"); I obtained the XPath query by using Firebug's " Copy as XPath " feature. The problem I'm having is that SelectSingleNode is returning null - it doesn't seem to find the element specified by the query. I'm a bit stumped

Possible to get HtmlNode's position & length within original input?

岁酱吖の 提交于 2020-01-04 06:52:29
问题 Consider the following HTML fragment ( _ is used for whitespace): <head> ... <link ... ___/> <!-- ... --> ... </head> I'm using Html Agility Pack (HAP) to read HTML files/fragments and to strip out links. What I want to do is find the LINK (and some other) elements and then replace them with whitespace, like so: <head> ... ____________ <!-- ... --> ... </head> The parsing part seems to be working so far, I get the nodes I'm looking for. However, HAP tries to fix the HTML content while I need

Possible to get HtmlNode's position & length within original input?

只谈情不闲聊 提交于 2020-01-04 06:51:13
问题 Consider the following HTML fragment ( _ is used for whitespace): <head> ... <link ... ___/> <!-- ... --> ... </head> I'm using Html Agility Pack (HAP) to read HTML files/fragments and to strip out links. What I want to do is find the LINK (and some other) elements and then replace them with whitespace, like so: <head> ... ____________ <!-- ... --> ... </head> The parsing part seems to be working so far, I get the nodes I'm looking for. However, HAP tries to fix the HTML content while I need

Starting Multiple Async Tasks and Process Them As They Complete (C#)

感情迁移 提交于 2020-01-04 06:15:28
问题 So I am trying to learn how to write asynchronous methods and have been banging my head to get asynchronous calls to work. What always seems to happen is the code hangs on "await" instruction until it eventually seems to time out and crash the loading form in the same method with it. There are two main reason this is strange: The code works flawlessly when not asynchronous and just a simple loop I copied the MSDN code almost verbatim to convert the code to asynchronous calls here: https:/

Html Agility Pack - how to select correct span class

偶尔善良 提交于 2020-01-04 02:34:09
问题 I'm trying to find lowest price on Amazon pages. Let's use this url as an example: http://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords=9963BB#/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords=E999-4701&rh=i%3Aaps%2Ck%3AE999-4701 I want to find the lowest price ... the number to the right of "new from". Here's what I have tried: using (TextWriter tw = new StreamWriter(@"D:\AmazonUrls.txt")) { foreach (string item in list) { var webGet = new HtmlWeb(); var document

HtmlAgilityPack getting page title and H1 tags

ぐ巨炮叔叔 提交于 2020-01-03 21:00:37
问题 Hey all i am trying to get the page title and H1 tags from a webpage by doing the following doc.LoadHtml(htmlSourceCode) txtTitle.Text = doc.GetElementsByTagName("title").InnerText() txtH1.Text = doc.GetElementsByTagName("H1").InnerText() For Each channel In doc.DocumentNode.SelectNodes(".//meta[@name='description']") txtDescription.Text = channel.Attributes("content").Value Next The only code above that works is the txtDescription part. Both the title and H1 do not. What type of syntax do i

Remove whitespaces and newlines when parsing with HtmlAgilityPack

元气小坏坏 提交于 2020-01-03 12:36:46
问题 I tried to parse HTML with the HtmlAgilityPack in the following way: HtmlDocument htmlDoc = new HtmlDocument(); htmlDoc.LoadHtml(xhtmlString); Unfortunately the xhtmlString contains unnecessary whitespaces and newline characters, so the _text of htmlDoc now looks like this: <html xmlns=\"http://www.w3.org/1999/xhtml\">\n\t<head></head>\n\t<body>\n\n<p>Alle Auktionen<br /></p>\n\n\t</body>\n</html> This is a problem for me when working with the child elements of the body. What is the easiest

Get web page using HtmlAgilityPack.NETCore

本秂侑毒 提交于 2020-01-03 09:08:26
问题 I used the HtmlAgilityPack for work with html pages. Previously I did this: HtmlWeb web = new HtmlWeb(); HtmlDocument document = web.Load(url); var nodes = document.DocumentNode.SelectNodes("necessary node"); but now i need to use the HtmlAgilityPack.NETCore where HtmlWeb is absent. What should i use instead HtmlWeb to have the same result? 回答1: Use the HttpClient as a new way to interact with remote resources via http. As for your solution, you probably need to use the async methods here for

Windows Phone 8 SDK WebClient Encoding Issue

六眼飞鱼酱① 提交于 2020-01-03 04:22:18
问题 I'm trying to parse html from a site using windows-1254 charset. but all Turkish characters shown like this: � � � � � Where is the actual problem? I did try these: webClient.Encoding = System.Text.Encoding.UTF8 webClient.Encoding = System.Text.Encoding.GetString("UTF-8"); as function this: public string ReplaceText(string _text) { _text = _text.Replace("Ä°", "İ").Replace("ı", "ı").Replace("ü", "ü").Replace("ÅŸ", "ş").Replace("Å", "Ş").Replace("ç", "ç").Replace("ö", "ö").Replace("ÄŸ", "ğ"

Get all attribute values of given tag with Html Agility Pack

别等时光非礼了梦想. 提交于 2020-01-01 19:37:08
问题 I want to get all values of 'id' attribute of 'span' tag with html agility pack. But instead of attributes I got tags themself. Here's the code private static IEnumerable<string> GetAllID() { HtmlDocument sourceDocument = new HtmlDocument(); sourceDocument.Load(FileName); var nodes = sourceDocument.DocumentNode.SelectNodes( @"//span/@id"); return nodes.Nodes().Select(x => x.Name); } I'll appreciate if someone tells me what's wrong here. 回答1: try var nodes = sourceDocument.DocumentNode