Faster way than Scanner or BufferedReader reading multiline data from STDIN?

被刻印的时光 ゝ 提交于 2019-12-08 04:54:56

问题


Note: I am currently coding in java. I am looking to read input data into a string, one line at a time (or more), and I expect a lot of total lines.

Right now I have implemented

scanner in = new Scanner(System.in)
while (in.hasNextLine()) {
    separated = in.nextLine().split(" ");
    ...
}

because within the line my inputs are space delimited.

Unfortunately, with millions of lines this process is VERY slow and he scanner is taking up more time than my data processing, so I looked into the java.io libraries and found a bunch of possibilities and I'm not sure which one to use (ByteArrayInputStream, FileInputStream, BufferedInputStream, PipedInputStream). Which one should I use?

To specify, my data is being piped in from a text file, every line has either 4 or 6 words ended by a newline character, and I need to analyze one line at a time, setting the (4 or 6) words to an array which I can temporarily manage. Data format:

392903840 a c b 293 32.90
382049804 a c 390
329084203 d e r 489 384.90
...

Is there a way where scanner can read 1000 or so lines at a time and become efficient or which of these datatypes should I use(to minimize speed)?

Sidenote: while experimenting I have tried:

java.io.BufferedReader stdin = new java.io.BufferedReader(new java.io.InputStreamReader(System.in));
while(in.ready()){
    separated = in.readLine().split(" ");
    ...
}

Which worked well, just wondering which one works best, and if there's any way to, say, read 100 lines into data at once then process everything. Too many options looking for the optimal solution.


回答1:


You should wrap your System.in with a BufferInputStream like:

BufferedInputStream bis = new BufferedInputStream(System.in);
Scanner in = new Scanner(bis);

because this minimises the amount of reads to System.in which raises efficiency (the BufferedInputStream).

Also, if you're only reading lines, you don't really need a Scanner, but a Reader (which has readLine() and ready() methods to get a new line and see if there's any more data to be read).

You would use it as such (see example at java6 : InputStreamReader):

(I added a cache size argument of 32MB to BufferedReader)

BufferedReader br = new BufferedReader(new InputStreamReader(System.in), 32*1024*1024);
while (br.ready()) {
    String line = br.readLine();
    // process line
}

From the InputStreamReader doc page:

Without buffering, each invocation of read() or readLine() could cause bytes to be read from the file, converted into characters, and then returned, which can be very inefficient.



来源:https://stackoverflow.com/questions/5172284/faster-way-than-scanner-or-bufferedreader-reading-multiline-data-from-stdin

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!