R XML Parse for a web address

前端未结

关注

 2  2092

I am trying to download weather data, similar to the question asked here: How to parse XML to R data frame but when I run the first line in the example, I get \"Error: 1: fa

相关标签:

2条回答

[愿得一人]

2021-01-06 17:31
You can download the file by setting a UserAgent as follows:
```
require(httr)
UA <- "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.0 Safari/537.36"
my_url <- "http://forecast.weather.gov/MapClick.php?lat=29.803&lon=-82.411&FcstType=digitalDWML"
doc <- GET(my_url, user_agent(UA))
```
Now have a look at content(doc, "text") to see that it is the file you see in the browser

Then you can parse it via XML or xml2. I find xml2 easier but that is just my taste. Both work.
```
data <- XML::xmlParse(content(doc, "text"))
data2 <- xml2::read_xml(content(doc, "text"))
```
Why do i have to use a user agent?
From the RCurl FAQ: http://www.omegahat.org/RCurl/FAQ.html

Why doesn't RCurl provide a default value for the useragent that some sites require?
This is a matter of philosophy. Firstly, libcurl doesn't specify a default value and it is a framework for others to build applications. Similarly, RCurl is a general framework for R programmers to create applications to make "Web" requests. Accordingly, we don't set the user agent either. We expect the R programmer to do this. R programmers using RCurl in an R package to make requests to a site should use the package name (and also the version of R) as the user agent and specify this in all requests.
Basically, we expect others to specify a meaningful value for useragent so that they identify themselves correctly.

Note that users (not recommended for programmers) can set the R option named RCurlOptions via R's option() function. The value should be a list of named curl options. This is used in each RCurl request merging these values with those specified in the call. This allows one to provide default values.

I suspect http://forecast.weather.gov/ to reject all requests without a UserAgent.
0 讨论(0)
发布评论:

提交评论
- 加载中...

悲&欢浪女

2021-01-06 17:33

I downloaded this url to a text file. After that, I get the content of the file and parse it to XML data. Here is my code:

rm(list=ls())
require(XML)
require(xml2)
require(httr)

url <- "http://forecast.weather.gov/MapClick.php?lat=29.803&lon=-82.411&FcstType=digitalDWML"

download.file(url=url,"url.txt" )
xmlParse(url)
data <- xmlParse("url.txt")

xml_data <- xmlToList(data)

location <- as.list(xml_data[["data"]][["location"]][["point"]])

start_time <- unlist(xml_data[["data"]][["time-layout"]][
    names(xml_data[["data"]][["time-layout"]]) == "start-valid-time"])

0 讨论(0)