I have tried different ways to create a large Hadoop SequenceFile with simply one short(<100bytes) key but one large (>1GB) value (BytesWriteable).
The following sample works for out-of-box:
which writes multiple random-length key and value with total size >3GB.
However, it is not what I am trying to do. So I modified it using hadoop 2.2.0 API to something like:
Path file = new Path("/input");
SequenceFile.Writer writer = SequenceFile.createWriter(conf,
int numBytesToWrite = fileSizeInMB * 1024 * 1024;
BytesWritable randomKey = new BytesWritable();
BytesWritable randomValue = new BytesWritable();
randomizeBytes(randomValue.getBytes(), 0, randomValue.getLength());
writer.append(randomKey, randomValue);
When fileSizeInMB>700MB, I am getting errors like:
at org.apache.hadoop.io.BytesWritable.setCapacity(BytesWritable.java:144)
at org.apache.hadoop.io.BytesWritable.setSize(BytesWritable.java:123)
I see this error being discussed, but not see any resolution. Note that int(2^32) can be as large as 2GB, it should not fail at 700MB.
If you have other alternative to create such large-value SequenceFile, please advise. I tried other approaches like IOutils.read from inputstream into a byte [], I got heap size or OOME.
just use ArrayPrimitiveWritable instead.
There is an int overflow by setting new capacity in BytesWritable here:
public void setSize(int size) {
if (size > getCapacity()) {
setCapacity(size * 3 / 2);
this.size = size;
700 Mb * 3 > 2Gb = int overflow!
As result you cannot deserialize (but can write and serialize) more than 700 Mb into BytesWritable.
In case you would like to use BytesWritable
, an option is set the capacity high enough before, so you utilize 2GB, not only 700MB:
randomValue.setSize(numBytesToWrite); // will not resize now
This bug has fixed in Hadoop recently, so in newer versions it should work even without that:
public void setSize(int size) {
if (size > getCapacity()) {
// Avoid overflowing the int too early by casting to a long.
long newSize = Math.min(Integer.MAX_VALUE, (3L * size) / 2L);
setCapacity((int) newSize);
this.size = size;