Debugging RCurl-based authentication & form submission

那年仲夏 提交于 2019-12-04 21:58:52

I've simplified the code still further:

library(httr)

base_url  <- "http://srda.cse.nd.edu"

loginURL <- modify_url(
  base_url, 
  path = "mediawiki/index.php", 
  query = list(
    title = "Special:Userlogin", 
    action = "submitlogin",
    type = "login",
    wpName1 = USER,
    wpPasswor1 = PASS
  )
)
r <- POST(loginURL)
stop_for_status(r)

queryURL <- modify_url(base_url, path = "cgi-bin/form.pl")
query <- list(
  uitems       = "user_name",
  utables      = "sf1104.users a, sf1104.artifact b",
  uwhere       = "a.user_id = b.submitted_by AND b.artifact_id = 304727",
  useparator   = ":",
  append_query = "1"
)
r <- POST(queryURL, body = query, multipart = FALSE)
stop_for_status(r)

But I'm still getting a 500. I tried:

  • setting extra cookies that I see in the browser (wiki_dbUserID, wiki_dbUserName)
  • setting header DNT to 1
  • setting referer to http://srda.cse.nd.edu/cgi-bin/form.pl
  • setting user-agent the same as chrome
  • setting accept "text/html"

Finally, finally, finally! I have figured out what was causing this problem, which gave me so much headache (figuratively and literally). It forced me to spend a lot of time reading various Internet resources (including many SO questions and answers), debugging my code and communicating with people. I spent a lot of time, but not in vain, as I learned a lot about RCurl, cookies, Web forms and HTTP protocol.

The reason appeared much simpler than I thought. While the direct reason of the form submission failure was related to cookie management, the underlying reason was using wrong parameter names (IDs) of the authentication form fields. The two pairs were very similar and it took only one extra character to trigger the whole problem.

Lesson learned: when facing issues, especially ones dealing with authentication, it's very important to check all names and IDs multiple times and very carefully to make sure they correspond the ones supposed to be used. Thank you to everyone who was helping or trying to help me with this issue!

The following provides clarification for the scenario (error situation).

From W3C RFC 2616 - HTTP/1.1 Specification:

10.5 Server Error 5xx

Response status codes beginning with the digit "5" indicate cases in which the server is aware that it has erred or is incapable of performing the request. Except when responding to a HEAD request, the server SHOULD include an entity containing an explanation of the error situation, and whether it is a temporary or permanent condition. User agents SHOULD display any included entity to the user. These response codes are applicable to any request method.

10.5.1 500 Internal Server Error

The server encountered an unexpected condition which prevented it from fulfilling the request.

My interpretation of the paragraph 10.5 is that it implies that there should be a more detailed explanation of the error situation beyond the one provided in paragraph 10.5.1. However, I recognize that it very well may be that the message for status code 500 (paragraph 10.5.1) is considered sufficient. Confirmations for either of interpretations are welcome!

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!