I\'m working with XML files from clinicaltrials.gov, which have a structure like this:
...
...
This code will put a subset of nodes that correspond to
from a clinical trial into a data frame:
library(XML)
clinicalTrialUrl <- "http://clinicaltrials.gov/ct2/show/NCT01480479?resultsxml=true"
xmlDoc <- xmlParse(clinicalTrialUrl, useInternalNode=TRUE)
locations <- xmlToDataFrame(getNodeSet(xmlDoc,"//location"))
In this case there are 221 locations. However, the code assumes sort of a flat structure and lumps subnodes together. For example, anything under
gets concatenated into a single string. I can go into the subnodes and put them one by one into a dataframe.