WebClient forbids opening wikipedia page?

心不动则不痛 提交于 2019-12-07 17:03:47

问题


Here's the code I'm trying to run:

var wc = new WebClient();
var stream = wc.OpenRead(
             "http://en.wikipedia.org/wiki/List_of_communities_in_New_Brunswick");

But I keep getting a 403 forbidden error. Don't understand why. It worked fine for other pages. I can open the page fine in my browser. How can I fix this?


回答1:


I wouldn't normally use OpenRead(), try DownloadData() or DownloadString() instead.

Also it might be that wikipedia is deliberately blocking your request because you have not provided a user agent string:

WebClient client = new WebClient();
client.Headers.Add("user-agent", 
    "Mozilla/5.0 (Windows; Windows NT 5.1; rv:1.9.2.4) Gecko/20100611 Firefox/3.6.4");

I use WebClient quite often, and learned quite quickly that websites can and will block your request if you don't provide a user agent string that matches a known web browser. Also, if you make up your own user agent string (eg "my super cool web scraper") you will also be blocked.

[Edit]

I changed my example user agent string to that of a modern version of Firefox. The original example I gave was the user agent string for IE6 which is not a good idea. Why? Some websites may perform filtering based on IE6 and send anyone with that browser a message or to a different page that says "Please update your browser" - this means you will not get the content you wanted to get.



来源:https://stackoverflow.com/questions/4811936/webclient-forbids-opening-wikipedia-page

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!