问题
I am looking at this great answer: https://stackoverflow.com/a/58211397/3502164.
The beginning of the solution includes:
library(httr)
library(xml2)
gr <- GET("https://nzffdms.niwa.co.nz/search")
doc <- read_html(content(gr, "text"))
xml_attr(xml_find_all(doc, ".//input[@name='search[_csrf_token]']"), "value")
Output is constant across multiple requests:
"59243d3a233492e9461f8f73136118f9"
My Default way so far would have been:
doc <- read_html("https://nzffdms.niwa.co.nz/search")
xml_attr(xml_find_all(doc, ".//input[@name='search[_csrf_token]']"), "value")
That results differs to the Output above and changes across multiple requests.
Question:
What is the difference in between:
read_html(url)
read_html(content(GET(url), "text"))
Why does it result in different values and why does only the "GET" solution Returns the csv in the linked question?
(I hope its ok to structure it in Kind of three Sub Questions).
What i tried:
Going down the Rabbit hole of function calls:
read_html
(ms <- methods("read_html"))
getAnywhere(ms[1])
xml2:::read_html
xml2:::read_html.default
#xml2:::read_html.response
read_xml
(ms <- methods("read_xml"))
getAnywhere(ms[1])
But that resulted in this Question: Find the correct method
Thoughts:
I dont see that the get request takes any headers or Cookies, that could explain different Responses.
From my understanding both read_html
and read_html(content(GET(.), "text"))
will return XML/html.
Final thought: Following https://stackoverflow.com/a/29045432/3502164 - 4, I could imagine that each GET request is handled over the same local port while read_html
takes a new port for each request. But that would not explain why the read_html
request do not work / do not result in the desired .csv Format.
来源:https://stackoverflow.com/questions/58219503/difference-between-read-htmlurl-and-read-htmlcontentgeturl-text