I have a html and a R code like these and need to relate each node value to its parent id in a data.frame. There are some different information available for each person.
In general its not going to be easy:
idNodes <- getNodeSet(doc, "//div[@id]")
ids <- lapply(idNodes, function(x) xmlAttrs(x)['id'])
values <- lapply(idNodes, xpathApply, path = './div[@class]', xmlValue)
attributes <- lapply(idNodes, xpathApply, path = './div[@class]', xmlAttrs)
do.call(rbind.data.frame, mapply(cbind, ids, values, attributes))
V1 V2 V3
1 1 555-5555 phone
2 1 jhon@123.com email
3 2 123-4567 phone
4 2 maria@gmail.com email
5 3 987-6543 phone
6 3 32 age
7 3 New York city
The above will give you attribute and value pairs assumming they are nested in a div
with an associated id
.
UPDATE: if you want to wrap it in an xpathApply type call
utilFun <- function(x){
id <- xmlGetAttr(x, 'id')
values <- sapply(xmlChildren(x, omitNodeTypes = "XMLInternalTextNode"), xmlValue)
attributes <- sapply(xmlChildren(x, omitNodeTypes = "XMLInternalTextNode"), xmlAttrs)
data.frame(id = id, attributes = attributes, values = values, stringsAsFactors = FALSE)
}
res <- xpathApply(doc, '//div[@id]', utilFun)
do.call(rbind, res)
id attributes values
1 1 phone 555-5555
2 1 email jhon@123.com
3 2 phone 123-4567
4 2 email maria@gmail.com
5 3 phone 987-6543
6 3 age 32
7 3 city New York