Given a string that isn\'t too long, what is the best way to read it line by line?
I know you can do:
BufferedReader reader = new BufferedReader(new
Or use new try with resources clause combined with Scanner:
try (Scanner scanner = new Scanner(value)) {
while (scanner.hasNextLine()) {
String line = scanner.nextLine();
// process the line
}
}
Since I was especially interested in the efficiency angle, I created a little test class (below). Outcome for 5,000,000 lines:
Comparing line breaking performance of different solutions
Testing 5000000 lines
Split (all): 14665 ms
Split (CR only): 3752 ms
Scanner: 10005
Reader: 2060
As usual, exact times may vary, but the ratio holds true however often I've run it.
Conclusion: the "simpler" and "more efficient" requirements of the OP can't be satisfied simultaneously, the split
solution (in either incarnation) is simpler, but the Reader
implementation beats the others hands down.
import java.io.BufferedReader;
import java.io.IOException;
import java.io.StringReader;
import java.util.ArrayList;
import java.util.List;
import java.util.Scanner;
/**
* Test class for splitting a string into lines at linebreaks
*/
public class LineBreakTest {
/** Main method: pass in desired line count as first parameter (default = 10000). */
public static void main(String[] args) {
int lineCount = args.length == 0 ? 10000 : Integer.parseInt(args[0]);
System.out.println("Comparing line breaking performance of different solutions");
System.out.printf("Testing %d lines%n", lineCount);
String text = createText(lineCount);
testSplitAllPlatforms(text);
testSplitWindowsOnly(text);
testScanner(text);
testReader(text);
}
private static void testSplitAllPlatforms(String text) {
long start = System.currentTimeMillis();
text.split("\n\r|\r");
System.out.printf("Split (regexp): %d%n", System.currentTimeMillis() - start);
}
private static void testSplitWindowsOnly(String text) {
long start = System.currentTimeMillis();
text.split("\n");
System.out.printf("Split (CR only): %d%n", System.currentTimeMillis() - start);
}
private static void testScanner(String text) {
long start = System.currentTimeMillis();
List<String> result = new ArrayList<>();
try (Scanner scanner = new Scanner(text)) {
while (scanner.hasNextLine()) {
result.add(scanner.nextLine());
}
}
System.out.printf("Scanner: %d%n", System.currentTimeMillis() - start);
}
private static void testReader(String text) {
long start = System.currentTimeMillis();
List<String> result = new ArrayList<>();
try (BufferedReader reader = new BufferedReader(new StringReader(text))) {
String line = reader.readLine();
while (line != null) {
result.add(line);
line = reader.readLine();
}
} catch (IOException exc) {
// quit
}
System.out.printf("Reader: %d%n", System.currentTimeMillis() - start);
}
private static String createText(int lineCount) {
StringBuilder result = new StringBuilder();
StringBuilder lineBuilder = new StringBuilder();
for (int i = 0; i < 20; i++) {
lineBuilder.append("word ");
}
String line = lineBuilder.toString();
for (int i = 0; i < lineCount; i++) {
result.append(line);
result.append("\n");
}
return result.toString();
}
}
You can also use the split
method of String:
String[] lines = myString.split(System.getProperty("line.separator"));
This gives you all lines in a handy array.
I don't know about the performance of split. It uses regular expressions.
You can also use:
String[] lines = someString.split("\n");
If that doesn't work try replacing \n
with \r\n
.
Solution using Java 8
features such as Stream API
and Method references
new BufferedReader(new StringReader(myString))
.lines().forEach(System.out::println);
or
public void someMethod(String myLongString) {
new BufferedReader(new StringReader(myLongString))
.lines().forEach(this::parseString);
}
private void parseString(String data) {
//do something
}
The easiest and most universal approach would be to just use the regex Linebreak matcher
\R
which matches Any Unicode linebreak sequence
:
Pattern NEWLINE = Pattern.compile("\\R")
String lines[] = NEWLINE.split(input)
@see https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/regex/Pattern.html