Raw (binary) data too big to write to disk. How to write chunk-wise to disk (appending)?

帅比萌擦擦* 提交于 2019-12-12 18:14:43

问题


I have a large raw vector in R (i.e. array of binary data) that I want to write to disk, but I'm getting an error telling me the vector is too large. Here's a reproducible example and the error I get:

> writeBin(raw(1024 * 1024 * 1024 * 2), "test.bin")

Error in writeBin(raw(1024 * 1024 * 1024 * 2), "test.bin") : 
  long vectors not supported yet: connections.c:4147

I've noticed that this is linked to the 2 GB file limit. If I try to write a single byte less (1024 * 1024 * 1024 * 2 - 1), it works just fine.

I was thinking about doing some kind of workaround, where I write chunks of the large file to disk in batches, only appending the binary data to the disk, like this:

large_file = raw(1024 * 1024 * 1024 * 2) 
chunk_size = 1024*1024*512
n_chunks = ceiling(length(large_file)/chunk_size)

for (i in 1:n_chunks)
{
  start_byte = ((i - 1) * chunk_size) + 1
  end_byte = start_byte + chunk_size - 1
  if (i == n_chunks)
    end_byte = length(large_file)
  this_chunk = large_file[start_byte:end_byte]
  appendBin(this_chunk, "test.bin") # <-- non-existing magical formula!
}

But I can't find any kind of function like the "appendBin" I wrote above or any other documentation in R that tells me how to append data straight to the disk.

So my question boils down to this: does anyone know how to append raw (binary) data to a file already on disk without having to read the full file on disk to memory first?

Extra details: I'm currently using R version 3.4.2 64bit on a Windows 10 PC with 192GB of RAM. I tried on another PC (R version 3.5 64bit, Windows 8 with 8GB of RAM) and had the exact same problem.

Any kind of insight or workaround would be greatly appreciated!!!

Thank you!


回答1:


Thanks to @MichaelChirico and @user2554330, I was able to figure out a work around. Essentially, I just need to open the file in "a+b" mode as a new connection and feed that file connection into the writeBin function.

Here's a copy of the working code.

large_file = raw(1024 * 1024 * 1024 * 3) 
chunk_size = 1024*1024*512
n_chunks = ceiling(length(large_file)/chunk_size)

if (file.exists("test.bin"))
  file.remove("test.bin")

for (i in 1:n_chunks)
{
  start_byte = ((i - 1) * chunk_size) + 1
  end_byte = start_byte + chunk_size - 1
  if (i == n_chunks)
    end_byte = length(large_file)
  this_chunk = large_file[start_byte:end_byte]
  output_file = file(description="test.bin",open="a+b")
  writeBin(this_chunk, output_file)
  close(output_file)
}

I know it's ugly that I'm opening and closing the file multiple times, but that kept the error from popping up with even bigger files.

Thanks again for the insights, guys! =)



来源:https://stackoverflow.com/questions/50297237/raw-binary-data-too-big-to-write-to-disk-how-to-write-chunk-wise-to-disk-app

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!