html-agility-pack

C# htmlagility, getting exception when i add header in following code

半世苍凉 提交于 2019-12-24 10:38:05
问题 I'm getting exception when i run this code Exception "header must be modified using the appropriate property or method." HtmlAgilityPack.HtmlWeb web = new HtmlWeb(); web.UserAgent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36"; web.PreRequest += (request) => { request.Headers.Add("Accept","text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"); request.Headers.Add("Accept-Language", "de-DE"); return true; };

Why would Html.AgilityPack miss some image tags?

泄露秘密 提交于 2019-12-24 08:31:38
问题 I am using the html agility pack and did something like this HtmlWeb web = new HtmlWeb(); HtmlDocument doc = web.Load("http://test.com"); int count = doc.DocumentNode.SelectNodes("//img").Count(); I get 38 back. When I go to that page and do $('img').size(); I get 43 back. Why is there a difference? Is "//img" just looking for root ones? Is that why I might be missing some? 回答1: Is "//img" just looking for root ones? No it looking for descendant nodes (children, grandchildren, etc. of the

C# scrape correct web content following jquery

蹲街弑〆低调 提交于 2019-12-24 08:07:40
问题 I've been using HtmlAgilityPack for awhile but the web resource I have been working with now has a (seems like) jQuery protocol the browser passes through. What I expect to load is a product page but what actually loads (verified by a WebBrowser control, and a WebClient DownloadString) is a redirect, asking the visitor to select a consultant and sign up with them. In other words, using Chrome's Inspect >> Elements tool, I get: <div data-v-1a7a6550="" class="product-extra-images"> <img data-v

Iterate with all elements and get text?

不羁的心 提交于 2019-12-24 07:58:13
问题 I am using the follow code to get all text from a page into a List<string> HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument(); doc.LoadHtml(content); foreach (var script in doc.DocumentNode.Descendants("script").ToArray()) script.Remove(); foreach (var style in doc.DocumentNode.Descendants("style").ToArray()) style.Remove(); foreach (HtmlAgilityPack.HtmlNode node in doc.DocumentNode.SelectNodes("//text()")) { string found = WebUtility.HtmlDecode(node.InnerText.Trim()); if

web query with HtmlAgilityPack throws System.Net.WebException: The request was aborted: Could not create SSL/TLS secure channel [duplicate]

徘徊边缘 提交于 2019-12-24 07:45:47
问题 This question already has answers here : The request was aborted: Could not create SSL/TLS secure channel (41 answers) Debugging failing HTTPS WebRequest (3 answers) Closed last year . From the several topics on this I read, the vast majority are in relation to paypal, and a few other are in relation to something called ServicePointManager. This has NO RELATION to the other problems! In this case I'm just trying a basic html agility pack example with no relation to paypal or

Html Agility Pack: Setting an HtmlNode's Attribute Value isn't reflected in the HtmlDocument

霸气de小男生 提交于 2019-12-24 07:28:09
问题 In Html Agility Pack, when I set an attribute of an HtmlNode, should I see this in the HtmlDocument from which the node was selected? Lets say that htmlDocument is an HtmlDocument. So the simplified code looks like this: HtmlNode documentNode = htmlDocument.DocumentNode; HtmlNodeCollection nodeCollection = documentNode.SelectNodes(someXPath); foreach(var node in nodeCollection) if(SomeCondition(node)) node.SetAttributeValue("class","something"); Now, I see the class attribte of node change,

how to extract a url's title, images and description using HTML Agility utility

£可爱£侵袭症+ 提交于 2019-12-24 07:06:50
问题 I want to extract Title, Description & images from URL using HTML Agility utility so far i am not able to find an example which is easy to understand & can help me to do it. I would appreciate if some can help me with example so that i can extract title, description & give user choice to select image from series of image (some thing similar to Facebook when we share a link). Updated: I have place a label for title, desc and a button , textbox on the .aspx page & i fire following code on

HTMLAgilityPack and separating on <br/>

若如初见. 提交于 2019-12-24 06:38:09
问题 I have some html, which is separated by <br/> e.g.: Jack Janson <br/> 309 123 456 <br/> My Special Street 43 What is the easiest way to retrieve the information in 3 columns? I am not an XPath expert, so another approach would be to separate the string on the line break, and just work with the array. Is there a smarter way to do it? Update: Forgot to format the code. 回答1: In pure XPATH over XML, you would use an XPATH expression like this: //preceding-sibling::br or //following-sibling::br

Html Agility Pack Implementation

▼魔方 西西 提交于 2019-12-24 03:49:11
问题 I am currently working on a C# program in Assembly where I am trying to implement Google Translate in my program. I am aware that I have to use HTMLAgilityPack in my program for it to work. I found this post and downloaded the HTMLAgilityPack, however when he says 1) and build the HTMLAgilityPack solution. 2) In your application, add a reference to HTMLAgilityPack.dll in the HTMLAgilityPack\Debug (or Realease) \bin folder. I do not know what he wants me to do. So far, I have downloaded and

HTML Agility to extract PHP tags

ε祈祈猫儿з 提交于 2019-12-23 22:18:13
问题 What syntax should be used with HTML Agility Pack to extract all Tags from a Php file..? HtmlNodeCollection tags = htmlDoc.DocumentNode.SelectNodes("//??php"); Throws an exception (invalid token). Tried escaping ? with ?? and \? Thanks 回答1: HTML Agility Pack does choke on nodes with ? in the name. The simplest option is probably to go through the HTML string before you load it into a document object and replace instances of <? with <php and so-on. That doesn't handle any nasty cases like