Note: I am currently coding in java. I am looking to read input data into a string, one line at a time (or more), and I expect a lot of total lines.
Right now I have implemented
scanner in = new Scanner(System.in)
while (in.hasNextLine()) {
separated = in.nextLine().split(" ");
...
}
because within the line my inputs are space delimited.
Unfortunately, with millions of lines this process is VERY slow and he scanner is taking up more time than my data processing, so I looked into the java.io libraries and found a bunch of possibilities and I'm not sure which one to use (ByteArrayInputStream
, FileInputStream
, BufferedInputStream
, PipedInputStream
). Which one should I use?
To specify, my data is being piped in from a text file, every line has either 4 or 6 words ended by a newline character, and I need to analyze one line at a time, setting the (4 or 6) words to an array which I can temporarily manage. Data format:
392903840 a c b 293 32.90
382049804 a c 390
329084203 d e r 489 384.90
...
Is there a way where scanner can read 1000 or so lines at a time and become efficient or which of these datatypes should I use(to minimize speed)?
Sidenote: while experimenting I have tried:
java.io.BufferedReader stdin = new java.io.BufferedReader(new java.io.InputStreamReader(System.in));
while(in.ready()){
separated = in.readLine().split(" ");
...
}
Which worked well, just wondering which one works best, and if there's any way to, say, read 100 lines into data at once then process everything. Too many options looking for the optimal solution.
You should wrap your System.in
with a BufferInputStream
like:
BufferedInputStream bis = new BufferedInputStream(System.in);
Scanner in = new Scanner(bis);
because this minimises the amount of reads to System.in which raises efficiency (the BufferedInputStream).
Also, if you're only reading lines, you don't really need a Scanner, but a Reader (which has readLine()
and ready()
methods to get a new line and see if there's any more data to be read).
You would use it as such (see example at java6 : InputStreamReader):
(I added a cache size argument of 32MB to BufferedReader
)
BufferedReader br = new BufferedReader(new InputStreamReader(System.in), 32*1024*1024);
while (br.ready()) {
String line = br.readLine();
// process line
}
From the InputStreamReader doc page:
Without buffering, each invocation of read() or readLine() could cause bytes to be read from the file, converted into characters, and then returned, which can be very inefficient.
来源:https://stackoverflow.com/questions/5172284/faster-way-than-scanner-or-bufferedreader-reading-multiline-data-from-stdin