I\'m querying data from an XML-based API. The API responses are paginated, so I have to make a bunch of queries to get the full data set.
Using read_xml
fro
After some trial and error, I've figured out how to do this with the xml2
package.
Let us consider the simple case of two very simple XML documents we'd like to combine together.
doc1 <- read_xml("<items><item>1</item><item>2</item><items>")
doc2 <- read_xml("<items><item>3</item><item>4</item><items>")
(Note: where the documents come from don't matter, the argument to read_xml
is anything it can read.)
To combine them together, simply do the following:
doc2children <- xml_children(doc2)
for (child in doc2children) {
xml_add_child(doc1, child)
}
Now when you look at doc1 you should see this:
> doc1
{xml_document}
<items>
[1] <item>\n 1</item>
[2] <item>\n 2</item>
[3] <item>\n 3</item>
[4] <item>\n 4</item>
Consider the XML package to initialize an empty document with <root>
and iteratively append other XML content using addChildren()
method from the root of each XML.
library(XML)
doc = newXMLDoc()
root = newXMLNode("root", doc = doc)
# LOOP THROUGH 50 REQUESTS
lapply(seq(50), function(i) {
# PARSE ALL CONTENT
tmp <- xmlParse("/path/to/API/call")
# APPEND FROM API XML ROOT
addChildren(root, getNodeSet(tmp, '/apixmlroot'))
})
# SAVE TO FILE OR USE doc FOR FURTHER WORK
saveXML(doc, file="/path/to/output.xml")
I cannot find a counterpart method in xml2 as its xml_add_child
requires a character string not node(s).