Error while parsing a very large (10 GB) XML file in R, using the XML package

对着背影说爱祢 提交于 2019-12-12 03:49:54

问题


Context
I'm currently working on a project involving osm data (Open Street Map). In order to manipulate geographic objects, I have to convert the data (an osm xml file) into an object. The osmar package lets me do this, but it fails to parse the raw xml data.

The error

Error in paste(file, collapse = "\n") : result would exceed 2^31-1 bytes

The code

require(osmar)
osmar_obj <- get_osm("anything", source = osmsource_file("my filename"))

Inside the get_osm function, the code calls ret <- xmlParse(raw), which triggers the error after a few seconds.

The question
How am I supposed to read a large XML file (here 10GB), knowing that I have 64G of memory ?

Thanks a lot !


回答1:


This is the solution I came up with, even though it is not 100% satisfying.

  1. Transform the .osm file by removing every newline (but the last) in your shell
  2. Run the exact same code as before, skipping the paste that is not needed anymore (since you just did the equivalent in shell)

Profit :)

Obviously, I'm not very happy with it because modifying the data file in shell is more a trick that an actual solution :(



来源:https://stackoverflow.com/questions/38526562/error-while-parsing-a-very-large-10-gb-xml-file-in-r-using-the-xml-package

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!