问题
I'm trying to process XML document parallel with R by xml2 package and foreach function. But I'm getting "Error in node_attrs(x$node, nsMap = ns) : external pointer is not valid". Tried to export tree with clusterExport.
Example code:
library(xml2)
library(foreach)
library(doParallel)
x <- read_xml("<x> node <yy>1</yy><yy>2</yy></x>")
nCores <- detectCores()
cl <- makeCluster(nCores)
clusterExport(cl, varlist = "x")
registerDoParallel(cl)
foreach(yy = xml_find_all(x, "/x/yy")) %dopar%
yy
stopCluster(cl)
so I don't understand how to avoid this error…
回答1:
xml2 objects (passed via yy
) can not be exported to other R processes because they hold "external pointer" that are unique to the R process (=the main R session) they were created on. If exported, those external pointers are completely useless on the background R processes (the workers), i.e. they are "not valid".
You can read a bit more about this in Section 'Non-exportable objects' of the 'A Future for R: Common Issues with Solutions' vignette.
The only parallel solution I am aware of is to keep all xml2 processing unique to each worker, e.g.
res <- foreach(file = files) %dopar% {
x <- read_xml(file)
lapply(xml_find_all(x, "/x/yy"), ...)
}
来源:https://stackoverflow.com/questions/55810140/parallel-processing-xml-nodes-with-r