How to properly set cookies to get URL content using httr

前端 未结 1 659
余生分开走
余生分开走 2020-12-19 08:10

I need to download information from web site that is protected using cookies. I pass this protection manually and then insert cookies to httr.

Here is

相关标签:
1条回答
  • 2020-12-19 08:43

    This would be the way to set_cookies with GET & httr:

    GET("http://smida.gov.ua/db/emitent/year/xml/showform/32153/125/templ", 
        set_cookies(`_SMIDA` = "7cf9ea4bfadb60bbd0950e2f8f4c279d",
                    `__utma` = "29983421.138599299.1413649536.1413649536.1413649536.1",
                    `__utmb` = "29983421.5.10.1413649536",
                    `__utmc` = "29983421",
                    `__utmt` = "1",
                    `__utmz` = "29983421.1413649536.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)"))
    

    That worked for me, well at least I think it did as I cannot read the language. A table comes back with the same structure and no prompt to login.

    Unfortunately the captcha on login prevents the use of Rselenium (or other, similar, crawling packages), so you'll have to continue to manually grab those cookies (or use a utility to extract them from the session).

    Finally, you probably want to seriously consider changing those credentials, now :-)


    EDIT: @VadymB and I both found that the code didn't work until we rebooted RStudio. Your mileage may vary.

    0 讨论(0)
提交回复
热议问题