问题
Well I am using the WebClient.DownloadString
in order to scrap a webpage unfortunately the DownloadString
gets me the page source without the CSS and JS updates (which are made in the internet explorer while page loads).
So I was wondering how can I use WebClient to load the whole page the same way internet explorer or WebBrowser
control does ? (with the css and js code injections)
回答1:
So I was wondering how can I use WebClient to load the whole page the same way internet explorer or WebBrowser control does ?
You can't do that. The WebClient
class is used to download a SINGLE resource using the HTTP protocol. It doesn't understand the concept of HTML. If you need to download associated resources in this HTML you will have to use an HTML parser (such as HTML Agility Pack for example) and for each CSS and javascript you encounter in the downloaded HTML page, send another HTTP request with the WebClient to retrieve it.
But bear in mind that depending on the webpage you are trying to scrape things might get more complicated. For example the web page could have javascript which in turn dynamically references and includes other static resources such as javascript or CSS. A WebClient, since it doesn't execute javascript might never know about them.
回答2:
The best solution for u is the ( https://htmlagilitypack.codeplex.com/ ) , it will download for u all the content of the webapage , but i'm not sure if u can get the css+javascript code using this tool
来源:https://stackoverflow.com/questions/17513266/load-dynamically-generated-html-code-in-webclient