RCurl::url.exists() : how to get non-error for redirects (in the 300 range of HTTP status codes)

天大地大妈咪最大 提交于 2019-12-13 03:10:18

问题


I have a bunch of URLs extracted by text-mining some PDF documents. Now I want to test the URLS for validity. Some urls have junk characters inside or appended, or the URLS are truncated. One approach is to filter them by calling each of them.

To do that, I use the url.exists() function from the RCurl package. The function makes HTTP HEAD requests to urls using curl and checks the status code.

From the documentation of ?url.exists

 If ‘.header’ is ‘FALSE’, this returns ‘TRUE’ or ‘FALSE’ indicating
 whether the request was successful (had a status with a value in
 the 200 range).

How can I make it return TRUE for urls that issue a redirect? Redirect status codes are in the 300 range. They are not really errors.

Or is there a better way? grabbing the actual status codes and process them manually? Should I use a system command here?

来源:https://stackoverflow.com/questions/15343560/rcurlurl-exists-how-to-get-non-error-for-redirects-in-the-300-range-of-ht

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!