Load dynamically generated HTML Code in WebClient

老子叫甜甜 提交于 2019-12-08 05:35:56

问题


Well I am using the WebClient.DownloadString in order to scrap a webpage unfortunately the DownloadString gets me the page source without the CSS and JS updates (which are made in the internet explorer while page loads).

So I was wondering how can I use WebClient to load the whole page the same way internet explorer or WebBrowser control does ? (with the css and js code injections)


回答1:


So I was wondering how can I use WebClient to load the whole page the same way internet explorer or WebBrowser control does ?

You can't do that. The WebClient class is used to download a SINGLE resource using the HTTP protocol. It doesn't understand the concept of HTML. If you need to download associated resources in this HTML you will have to use an HTML parser (such as HTML Agility Pack for example) and for each CSS and javascript you encounter in the downloaded HTML page, send another HTTP request with the WebClient to retrieve it.

But bear in mind that depending on the webpage you are trying to scrape things might get more complicated. For example the web page could have javascript which in turn dynamically references and includes other static resources such as javascript or CSS. A WebClient, since it doesn't execute javascript might never know about them.




回答2:


The best solution for u is the ( https://htmlagilitypack.codeplex.com/ ) , it will download for u all the content of the webapage , but i'm not sure if u can get the css+javascript code using this tool



来源:https://stackoverflow.com/questions/17513266/load-dynamically-generated-html-code-in-webclient

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!