I have a BufferedWriter
as shown below:
BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(
new GZIPOutputStream( hdfs.cr
This is similar to the response by Olaseni, but I moved the counting into the BufferedOutputStream
rather than the GZIPOutputStream
, and this is more robust, since def.getBytesRead()
in Olaseni's answer is not available after the stream has been closed.
With the implementation below, you can supply your own AtomicLong
to the constructor so that you can assign the CountingBufferedOutputStream
in a try-with-resources block, but still retrieve the count after the block has exited (i.e. after the file is closed).
public static class CountingBufferedOutputStream extends BufferedOutputStream {
private final AtomicLong bytesWritten;
public CountingBufferedOutputStream(OutputStream out) throws IOException {
super(out);
this.bytesWritten = new AtomicLong();
}
public CountingBufferedOutputStream(OutputStream out, int bufSize) throws IOException {
super(out, bufSize);
this.bytesWritten = new AtomicLong();
}
public CountingBufferedOutputStream(OutputStream out, int bufSize, AtomicLong bytesWritten)
throws IOException {
super(out, bufSize);
this.bytesWritten = bytesWritten;
}
@Override
public void write(byte[] b) throws IOException {
super.write(b);
bytesWritten.addAndGet(b.length);
}
@Override
public void write(byte[] b, int off, int len) throws IOException {
super.write(b, off, len);
bytesWritten.addAndGet(len);
}
@Override
public synchronized void write(int b) throws IOException {
super.write(b);
bytesWritten.incrementAndGet();
}
public long getBytesWritten() {
return bytesWritten.get();
}
}
You can use the CountingOutputStream from Apache commons IO library.
Place it between the GZIPOutputStream and the file Outputstream (hdfs.create(..)).
After writing the content to the file you can read the number of written bytes from the CountingOutputStream instance.
You can make you own descendant of OutputStream
and count how many time write
method was invoked
If this isn't too late and you are using 1.7+ and you don't wan't to pull in an entire library like Guava or Commons-IO, you can just extend the GZIPOutputStream and obtain the data from the associated Deflater like so:
public class MyGZIPOutputStream extends GZIPOutputStream {
public MyGZIPOutputStream(OutputStream out) throws IOException {
super(out);
}
public long getBytesRead() {
return def.getBytesRead();
}
public long getBytesWritten() {
return def.getBytesWritten();
}
public void setLevel(int level) {
def.setLevel(level);
}
}