问题
I have a large raw vector in R (i.e. array of binary data) that I want to write to disk, but I'm getting an error telling me the vector is too large. Here's a reproducible example and the error I get:
> writeBin(raw(1024 * 1024 * 1024 * 2), "test.bin")
Error in writeBin(raw(1024 * 1024 * 1024 * 2), "test.bin") :
long vectors not supported yet: connections.c:4147
I've noticed that this is linked to the 2 GB file limit. If I try to write a single byte less (1024 * 1024 * 1024 * 2 - 1), it works just fine.
I was thinking about doing some kind of workaround, where I write chunks of the large file to disk in batches, only appending the binary data to the disk, like this:
large_file = raw(1024 * 1024 * 1024 * 2)
chunk_size = 1024*1024*512
n_chunks = ceiling(length(large_file)/chunk_size)
for (i in 1:n_chunks)
{
start_byte = ((i - 1) * chunk_size) + 1
end_byte = start_byte + chunk_size - 1
if (i == n_chunks)
end_byte = length(large_file)
this_chunk = large_file[start_byte:end_byte]
appendBin(this_chunk, "test.bin") # <-- non-existing magical formula!
}
But I can't find any kind of function like the "appendBin" I wrote above or any other documentation in R that tells me how to append data straight to the disk.
So my question boils down to this: does anyone know how to append raw (binary) data to a file already on disk without having to read the full file on disk to memory first?
Extra details: I'm currently using R version 3.4.2 64bit on a Windows 10 PC with 192GB of RAM. I tried on another PC (R version 3.5 64bit, Windows 8 with 8GB of RAM) and had the exact same problem.
Any kind of insight or workaround would be greatly appreciated!!!
Thank you!
回答1:
Thanks to @MichaelChirico and @user2554330, I was able to figure out a work around. Essentially, I just need to open the file in "a+b" mode as a new connection and feed that file connection into the writeBin function.
Here's a copy of the working code.
large_file = raw(1024 * 1024 * 1024 * 3)
chunk_size = 1024*1024*512
n_chunks = ceiling(length(large_file)/chunk_size)
if (file.exists("test.bin"))
file.remove("test.bin")
for (i in 1:n_chunks)
{
start_byte = ((i - 1) * chunk_size) + 1
end_byte = start_byte + chunk_size - 1
if (i == n_chunks)
end_byte = length(large_file)
this_chunk = large_file[start_byte:end_byte]
output_file = file(description="test.bin",open="a+b")
writeBin(this_chunk, output_file)
close(output_file)
}
I know it's ugly that I'm opening and closing the file multiple times, but that kept the error from popping up with even bigger files.
Thanks again for the insights, guys! =)
来源:https://stackoverflow.com/questions/50297237/raw-binary-data-too-big-to-write-to-disk-how-to-write-chunk-wise-to-disk-app