Difference between read_html(url) and read_html(content(GET(url), “text”))

问题

I am looking at this great answer: https://stackoverflow.com/a/58211397/3502164.

The beginning of the solution includes:

library(httr)
library(xml2)

gr <- GET("https://nzffdms.niwa.co.nz/search")
doc <- read_html(content(gr, "text"))

xml_attr(xml_find_all(doc, ".//input[@name='search[_csrf_token]']"), "value")

Output is constant across multiple requests:

"59243d3a233492e9461f8f73136118f9"

My Default way so far would have been:

doc <- read_html("https://nzffdms.niwa.co.nz/search")
xml_attr(xml_find_all(doc, ".//input[@name='search[_csrf_token]']"), "value")

That results differs to the Output above and changes across multiple requests.

Question:

What is the difference in between:

read_html(url)
read_html(content(GET(url), "text"))

Why does it result in different values and why does only the "GET" solution Returns the csv in the linked question?

(I hope its ok to structure it in Kind of three Sub Questions).

What i tried:

Going down the Rabbit hole of function calls:

read_html
(ms <- methods("read_html"))
getAnywhere(ms[1])
xml2:::read_html
xml2:::read_html.default
#xml2:::read_html.response

read_xml
(ms <- methods("read_xml"))
getAnywhere(ms[1])

But that resulted in this Question: Find the correct method

Thoughts:

I dont see that the get request takes any headers or Cookies, that could explain different Responses.

From my understanding both read_html and read_html(content(GET(.), "text")) will return XML/html.

Final thought: Following https://stackoverflow.com/a/29045432/3502164 - 4, I could imagine that each GET request is handled over the same local port while read_html takes a new port for each request. But that would not explain why the read_html request do not work / do not result in the desired .csv Format.

来源：https://stackoverflow.com/questions/58219503/difference-between-read-htmlurl-and-read-htmlcontentgeturl-text

标签

get

xml2