问题
Here's the code I'm trying to run:
var wc = new WebClient();
var stream = wc.OpenRead(
"http://en.wikipedia.org/wiki/List_of_communities_in_New_Brunswick");
But I keep getting a 403 forbidden error. Don't understand why. It worked fine for other pages. I can open the page fine in my browser. How can I fix this?
回答1:
I wouldn't normally use OpenRead()
, try DownloadData()
or DownloadString()
instead.
Also it might be that wikipedia is deliberately blocking your request because you have not provided a user agent string:
WebClient client = new WebClient();
client.Headers.Add("user-agent",
"Mozilla/5.0 (Windows; Windows NT 5.1; rv:1.9.2.4) Gecko/20100611 Firefox/3.6.4");
I use WebClient quite often, and learned quite quickly that websites can and will block your request if you don't provide a user agent string that matches a known web browser. Also, if you make up your own user agent string (eg "my super cool web scraper") you will also be blocked.
[Edit]
I changed my example user agent string to that of a modern version of Firefox. The original example I gave was the user agent string for IE6 which is not a good idea. Why? Some websites may perform filtering based on IE6 and send anyone with that browser a message or to a different page that says "Please update your browser" - this means you will not get the content you wanted to get.
来源:https://stackoverflow.com/questions/4811936/webclient-forbids-opening-wikipedia-page