How to parse XML to R data frame

前端 未结 4 2122
离开以前
离开以前 2020-11-22 15:51

I tried to parse XML to R data frame, this link helped me a lot:

how to create an R data frame from a xml file

But still I was not able to figure out my prob

相关标签:
4条回答
  • 2020-11-22 16:40

    Here's a partial solution using xml2. Breaking the solution up into smaller pieces generally makes it easier to ensure everything is lined up:

    library(xml2)
    data <- read_xml("http://forecast.weather.gov/MapClick.php?lat=29.803&lon=-82.411&FcstType=digitalDWML")
    
    # Point locations
    point <- data %>% xml_find_all("//point")
    point %>% xml_attr("latitude") %>% as.numeric()
    point %>% xml_attr("longitude") %>% as.numeric()
    
    # Start time
    data %>% 
      xml_find_all("//start-valid-time") %>% 
      xml_text()
    
    # Temperature
    data %>% 
      xml_find_all("//temperature[@type='hourly']/value") %>% 
      xml_text() %>% 
      as.integer()
    
    0 讨论(0)
  • 2020-11-22 16:50

    Use xpath more directly for both performance and clarity.

    time_path <- "//start-valid-time"
    temp_path <- "//temperature[@type='hourly']/value"
    
    df <- data.frame(
        latitude=data[["number(//point/@latitude)"]],
        longitude=data[["number(//point/@longitude)"]],
        start_valid_time=sapply(data[time_path], xmlValue),
        hourly_temperature=as.integer(sapply(data[temp_path], as, "integer"))
    

    leading to

    > head(df, 2)
      latitude longitude          start_valid_time hourly_temperature
    1    29.81    -82.42 2014-02-14T18:00:00-05:00                 60
    2    29.81    -82.42 2014-02-14T19:00:00-05:00                 55
    
    0 讨论(0)
  • 2020-11-22 16:54

    Data in XML format are rarely organized in a way that would allow the xmlToDataFrame function to work. You're better off extracting everything in lists and then binding the lists together in a data frame:

    require(XML)
    data <- xmlParse("http://forecast.weather.gov/MapClick.php?lat=29.803&lon=-82.411&FcstType=digitalDWML")
    
    xml_data <- xmlToList(data)
    

    In the case of your example data, getting location and start time is fairly straightforward:

    location <- as.list(xml_data[["data"]][["location"]][["point"]])
    
    start_time <- unlist(xml_data[["data"]][["time-layout"]][
        names(xml_data[["data"]][["time-layout"]]) == "start-valid-time"])
    

    Temperature data is a bit more complicated. First you need to get to the node that contains the temperature lists. Then you need extract both the lists, look within each one, and pick the one that has "hourly" as one of its values. Then you need to select only that list but only keep the values that have the "value" label:

    temps <- xml_data[["data"]][["parameters"]]
    temps <- temps[names(temps) == "temperature"]
    temps <- temps[sapply(temps, function(x) any(unlist(x) == "hourly"))]
    temps <- unlist(temps[[1]][sapply(temps, names) == "value"])
    
    out <- data.frame(
      as.list(location),
      "start_valid_time" = start_time,
      "hourly_temperature" = temps)
    
    head(out)
      latitude longitude          start_valid_time hourly_temperature
    1    29.81    -82.42 2013-06-19T16:00:00-04:00                 91
    2    29.81    -82.42 2013-06-19T17:00:00-04:00                 90
    3    29.81    -82.42 2013-06-19T18:00:00-04:00                 89
    4    29.81    -82.42 2013-06-19T19:00:00-04:00                 85
    5    29.81    -82.42 2013-06-19T20:00:00-04:00                 83
    6    29.81    -82.42 2013-06-19T21:00:00-04:00                 80
    
    0 讨论(0)
  • 2020-11-22 16:54

    You can try the code below:

    # Load the packages required to read XML files.
    library("XML")
    library("methods")
    
    # Convert the input xml file to a data frame.
    xmldataframe <- xmlToDataFrame("input.xml")
    print(xmldataframe)
    
    0 讨论(0)
提交回复
热议问题