html-agility-pack | 易学教程

XPath Query Problem using HTML Agility Pack

阅读更多关于 XPath Query Problem using HTML Agility Pack

问题 I'm trying to scrape the price field from this website using the HTML Agility Pack. My code is as follows; var web = new HtmlWeb(); var doc = web.Load(String.Format(overClockersURL, componentID)); var priceContent = doc.DocumentNode.SelectSingleNode("//*[@id=\"prodprice\"]"); I obtained the XPath query by using Firebug's " Copy as XPath " feature. The problem I'm having is that SelectSingleNode is returning null - it doesn't seem to find the element specified by the query. I'm a bit stumped

Possible to get HtmlNode's position & length within original input?

阅读更多关于 Possible to get HtmlNode's position & length within original input?

问题 Consider the following HTML fragment ( _ is used for whitespace): <head> ... <link ... ___/>  ... </head> I'm using Html Agility Pack (HAP) to read HTML files/fragments and to strip out links. What I want to do is find the LINK (and some other) elements and then replace them with whitespace, like so: <head> ... ____________  ... </head> The parsing part seems to be working so far, I get the nodes I'm looking for. However, HAP tries to fix the HTML content while I need

Possible to get HtmlNode's position & length within original input?

阅读更多关于 Possible to get HtmlNode's position & length within original input?

Starting Multiple Async Tasks and Process Them As They Complete (C#)

阅读更多关于 Starting Multiple Async Tasks and Process Them As They Complete (C#)

问题 So I am trying to learn how to write asynchronous methods and have been banging my head to get asynchronous calls to work. What always seems to happen is the code hangs on "await" instruction until it eventually seems to time out and crash the loading form in the same method with it. There are two main reason this is strange: The code works flawlessly when not asynchronous and just a simple loop I copied the MSDN code almost verbatim to convert the code to asynchronous calls here: https:/

Html Agility Pack - how to select correct span class

阅读更多关于 Html Agility Pack - how to select correct span class

问题 I'm trying to find lowest price on Amazon pages. Let's use this url as an example: http://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords=9963BB#/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords=E999-4701&rh=i%3Aaps%2Ck%3AE999-4701 I want to find the lowest price ... the number to the right of "new from". Here's what I have tried: using (TextWriter tw = new StreamWriter(@"D:\AmazonUrls.txt")) { foreach (string item in list) { var webGet = new HtmlWeb(); var document

HtmlAgilityPack getting page title and H1 tags

阅读更多关于 HtmlAgilityPack getting page title and H1 tags

问题 Hey all i am trying to get the page title and H1 tags from a webpage by doing the following doc.LoadHtml(htmlSourceCode) txtTitle.Text = doc.GetElementsByTagName("title").InnerText() txtH1.Text = doc.GetElementsByTagName("H1").InnerText() For Each channel In doc.DocumentNode.SelectNodes(".//meta[@name='description']") txtDescription.Text = channel.Attributes("content").Value Next The only code above that works is the txtDescription part. Both the title and H1 do not. What type of syntax do i

Remove whitespaces and newlines when parsing with HtmlAgilityPack

阅读更多关于 Remove whitespaces and newlines when parsing with HtmlAgilityPack

问题 I tried to parse HTML with the HtmlAgilityPack in the following way: HtmlDocument htmlDoc = new HtmlDocument(); htmlDoc.LoadHtml(xhtmlString); Unfortunately the xhtmlString contains unnecessary whitespaces and newline characters, so the _text of htmlDoc now looks like this: <html xmlns=\"http://www.w3.org/1999/xhtml\">\n\t<head></head>\n\t<body>\n\n<p>Alle Auktionen<br /></p>\n\n\t</body>\n</html> This is a problem for me when working with the child elements of the body. What is the easiest

Get web page using HtmlAgilityPack.NETCore

阅读更多关于 Get web page using HtmlAgilityPack.NETCore

问题 I used the HtmlAgilityPack for work with html pages. Previously I did this: HtmlWeb web = new HtmlWeb(); HtmlDocument document = web.Load(url); var nodes = document.DocumentNode.SelectNodes("necessary node"); but now i need to use the HtmlAgilityPack.NETCore where HtmlWeb is absent. What should i use instead HtmlWeb to have the same result? 回答1: Use the HttpClient as a new way to interact with remote resources via http. As for your solution, you probably need to use the async methods here for

Windows Phone 8 SDK WebClient Encoding Issue

阅读更多关于 Windows Phone 8 SDK WebClient Encoding Issue

问题 I'm trying to parse html from a site using windows-1254 charset. but all Turkish characters shown like this: � � � � � Where is the actual problem? I did try these: webClient.Encoding = System.Text.Encoding.UTF8 webClient.Encoding = System.Text.Encoding.GetString("UTF-8"); as function this: public string ReplaceText(string _text) { _text = _text.Replace("Ä°", "İ").Replace("Ä±", "ı").Replace("Ã¼", "ü").Replace("ÅŸ", "ş").Replace("Å", "Ş").Replace("Ã§", "ç").Replace("Ã¶", "ö").Replace("ÄŸ", "ğ"

Get all attribute values of given tag with Html Agility Pack

阅读更多关于 Get all attribute values of given tag with Html Agility Pack

问题 I want to get all values of 'id' attribute of 'span' tag with html agility pack. But instead of attributes I got tags themself. Here's the code private static IEnumerable<string> GetAllID() { HtmlDocument sourceDocument = new HtmlDocument(); sourceDocument.Load(FileName); var nodes = sourceDocument.DocumentNode.SelectNodes( @"//span/@id"); return nodes.Nodes().Select(x => x.Name); } I'll appreciate if someone tells me what's wrong here. 回答1: try var nodes = sourceDocument.DocumentNode