What are the differences (if any) between the following two buffering approaches?
Reader r1 = new BufferedReader(new InputStreamReader(in, \"UTF-8\"), bufferSize
FWIW, if you're opening a file in Java 8, you can use the Files.newBufferedReader(Path). I don't know how the performance compares to the other solutions described here, but at least it pushes the decision of what construct to buffer into the JDK.
r1
is more efficient. The InputStreamReader
itself doesn't have a large buffer. The BufferedReader
can be set to have a larger buffer than InputStreamReader
. The InputStreamReader
in r2
would act as a bottleneck.
In a nut: you should read the data through a funnel, not through a bottle.
Update: here's a little benchmark program, just copy'n'paste'n'run it. You don't need to prepare files.
package com.stackoverflow.q3459127;
import java.io.BufferedInputStream;
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileWriter;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.Reader;
public class Test {
public static void main(String... args) throws Exception {
// Init.
int bufferSize = 10240; // 10KB.
int fileSize = 100 * 1024 * 1024; // 100MB.
File file = new File("/temp.txt");
// Create file (it's also a good JVM warmup).
System.out.print("Creating file .. ");
BufferedWriter writer = null;
try {
writer = new BufferedWriter(new FileWriter(file));
for (int i = 0; i < fileSize; i++) {
writer.write("0");
}
System.out.printf("finished, file size: %d MB.%n", file.length() / 1024 / 1024);
} finally {
if (writer != null) try { writer.close(); } catch (IOException ignore) {}
}
// Read through funnel.
System.out.print("Reading through funnel .. ");
Reader r1 = null;
try {
r1 = new BufferedReader(new InputStreamReader(new FileInputStream(file), "UTF-8"), bufferSize);
long st = System.nanoTime();
for (int data; (data = r1.read()) > -1;);
long et = System.nanoTime();
System.out.printf("finished in %d ms.%n", (et - st) / 1000000);
} finally {
if (r1 != null) try { r1.close(); } catch (IOException ignore) {}
}
// Read through bottle.
System.out.print("Reading through bottle .. ");
Reader r2 = null;
try {
r2 = new InputStreamReader(new BufferedInputStream(new FileInputStream(file), bufferSize), "UTF-8");
long st = System.nanoTime();
for (int data; (data = r2.read()) > -1;);
long et = System.nanoTime();
System.out.printf("finished in %d ms.%n", (et - st) / 1000000);
} finally {
if (r2 != null) try { r2.close(); } catch (IOException ignore) {}
}
// Cleanup.
if (!file.delete()) System.err.printf("Oops, failed to delete %s. Cleanup yourself.%n", file.getAbsolutePath());
}
}
Results at my Latitude E5500 with a Seagate Momentus 7200.3 harddisk:
Creating file .. finished, file size: 99 MB. Reading through funnel .. finished in 1593 ms. Reading through bottle .. finished in 7760 ms.
r1
is also more convenient when you read line-based stream as BufferedReader
supports readLine
method. You don't have to read content into a char array buffer or chars one by one. However, you have to cast r1
to BufferedReader
or use that type explicitly for the variable.
I often use this code snippet:
BufferedReader br = ...
String line;
while((line=br.readLine())!=null) {
//process line
}
In response to Ross Studtman's question in the comment above (but also relevant to the OP):
BufferedReader reader = new BufferedReader(new InputStreamReader(new BufferedInputSream(inputStream), "UTF-8"));
The BufferedInputStream
is superfluous (and likely harms performance due to extraneous copying). This is because the BufferedReader
requests characters from the InputStreamReader
in large chunks by calling InputStreamReader.read(char[], int, int)
, which in turn (through StreamDecoder
) calls InputStream.read(byte[], int, int)
to read a large block of bytes from the underlying InputStream
.
You can convince yourself that this is so by running the following code:
new BufferedReader(new InputStreamReader(new ByteArrayInputStream("Hello world!".getBytes("UTF-8")) {
@Override
public synchronized int read() {
System.err.println("ByteArrayInputStream.read()");
return super.read();
}
@Override
public synchronized int read(byte[] b, int off, int len) {
System.err.println("ByteArrayInputStream.read(..., " + off + ", " + len + ')');
return super.read(b, off, len);
}
}, "UTF-8") {
@Override
public int read() throws IOException {
System.err.println("InputStreamReader.read()");
return super.read();
}
@Override
public int read(char[] cbuf, int offset, int length) throws IOException {
System.err.println("InputStreamReader.read(..., " + offset + ", " + length + ')');
return super.read(cbuf, offset, length);
}
}).read(); // read one character from the BufferedReader
You will see the following output:
InputStreamReader.read(..., 0, 8192)
ByteArrayInputStream.read(..., 0, 8192)
This demonstrates that the BufferedReader
requests a large chunk of characters from the InputStreamReader
, which in turn requests a large chunk of bytes from the underlying InputStream
.