R XML Parse for a web address

前端 未结 2 2084
清酒与你
清酒与你 2021-01-06 17:29

I am trying to download weather data, similar to the question asked here: How to parse XML to R data frame but when I run the first line in the example, I get \"Error: 1: fa

相关标签:
2条回答
  • 2021-01-06 17:31

    You can download the file by setting a UserAgent as follows:

    require(httr)
    UA <- "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.0 Safari/537.36"
    my_url <- "http://forecast.weather.gov/MapClick.php?lat=29.803&lon=-82.411&FcstType=digitalDWML"
    doc <- GET(my_url, user_agent(UA))
    

    Now have a look at content(doc, "text") to see that it is the file you see in the browser

    Then you can parse it via XML or xml2. I find xml2 easier but that is just my taste. Both work.

    data <- XML::xmlParse(content(doc, "text"))
    data2 <- xml2::read_xml(content(doc, "text"))
    

    Why do i have to use a user agent?
    From the RCurl FAQ: http://www.omegahat.org/RCurl/FAQ.html

    Why doesn't RCurl provide a default value for the useragent that some sites require?
    This is a matter of philosophy. Firstly, libcurl doesn't specify a default value and it is a framework for others to build applications. Similarly, RCurl is a general framework for R programmers to create applications to make "Web" requests. Accordingly, we don't set the user agent either. We expect the R programmer to do this. R programmers using RCurl in an R package to make requests to a site should use the package name (and also the version of R) as the user agent and specify this in all requests.
    Basically, we expect others to specify a meaningful value for useragent so that they identify themselves correctly.

    Note that users (not recommended for programmers) can set the R option named RCurlOptions via R's option() function. The value should be a list of named curl options. This is used in each RCurl request merging these values with those specified in the call. This allows one to provide default values.

    I suspect http://forecast.weather.gov/ to reject all requests without a UserAgent.

    0 讨论(0)
  • 2021-01-06 17:33

    I downloaded this url to a text file. After that, I get the content of the file and parse it to XML data. Here is my code:

    rm(list=ls())
    require(XML)
    require(xml2)
    require(httr)
    
    url <- "http://forecast.weather.gov/MapClick.php?lat=29.803&lon=-82.411&FcstType=digitalDWML"
    
    download.file(url=url,"url.txt" )
    xmlParse(url)
    data <- xmlParse("url.txt")
    
    xml_data <- xmlToList(data)
    
    location <- as.list(xml_data[["data"]][["location"]][["point"]])
    
    start_time <- unlist(xml_data[["data"]][["time-layout"]][
        names(xml_data[["data"]][["time-layout"]]) == "start-valid-time"])
    
    0 讨论(0)
提交回复
热议问题