R: How to get parent attributes and node values at the site time?

后端 未结 1 1617
悲&欢浪女
悲&欢浪女 2020-12-07 02:36

I have a html and a R code like these and need to relate each node value to its parent id in a data.frame. There are some different information available for each person.

相关标签:
1条回答
  • 2020-12-07 03:04

    In general its not going to be easy:

    idNodes <- getNodeSet(doc, "//div[@id]")
    ids <- lapply(idNodes, function(x) xmlAttrs(x)['id'])
    values <- lapply(idNodes, xpathApply, path = './div[@class]', xmlValue)
    attributes <- lapply(idNodes, xpathApply, path = './div[@class]', xmlAttrs)
    do.call(rbind.data.frame, mapply(cbind, ids, values, attributes))
      V1              V2    V3
    1  1        555-5555 phone
    2  1    jhon@123.com email
    3  2        123-4567 phone
    4  2 maria@gmail.com email
    5  3        987-6543 phone
    6  3              32   age
    7  3        New York  city
    

    The above will give you attribute and value pairs assumming they are nested in a div with an associated id.

    UPDATE: if you want to wrap it in an xpathApply type call

    utilFun <- function(x){
      id <- xmlGetAttr(x, 'id')
      values <- sapply(xmlChildren(x, omitNodeTypes = "XMLInternalTextNode"), xmlValue)
      attributes <- sapply(xmlChildren(x, omitNodeTypes = "XMLInternalTextNode"), xmlAttrs)
      data.frame(id = id, attributes = attributes, values = values, stringsAsFactors = FALSE)
    }
    res <- xpathApply(doc, '//div[@id]', utilFun)
    do.call(rbind, res)
      id attributes          values
    1  1      phone        555-5555
    2  1      email    jhon@123.com
    3  2      phone        123-4567
    4  2      email maria@gmail.com
    5  3      phone        987-6543
    6  3        age              32
    7  3       city        New York
    
    0 讨论(0)
提交回复
热议问题