Parallel processing XML nodes with R

不羁的心 提交于 2021-02-08 10:25:45

问题


I'm trying to process XML document parallel with R by xml2 package and foreach function. But I'm getting "Error in node_attrs(x$node, nsMap = ns) : external pointer is not valid". Tried to export tree with clusterExport.

Example code:

library(xml2)
library(foreach)
library(doParallel)

x <- read_xml("<x> node <yy>1</yy><yy>2</yy></x>")

nCores <- detectCores()
cl <- makeCluster(nCores)
clusterExport(cl, varlist = "x")
registerDoParallel(cl)

foreach(yy = xml_find_all(x, "/x/yy")) %dopar%
  yy

stopCluster(cl)

so I don't understand how to avoid this error…


回答1:


xml2 objects (passed via yy) can not be exported to other R processes because they hold "external pointer" that are unique to the R process (=the main R session) they were created on. If exported, those external pointers are completely useless on the background R processes (the workers), i.e. they are "not valid".

You can read a bit more about this in Section 'Non-exportable objects' of the 'A Future for R: Common Issues with Solutions' vignette.

The only parallel solution I am aware of is to keep all xml2 processing unique to each worker, e.g.

res <- foreach(file = files) %dopar% {
   x <- read_xml(file)
   lapply(xml_find_all(x, "/x/yy"), ...)
}


来源:https://stackoverflow.com/questions/55810140/parallel-processing-xml-nodes-with-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!