I just learned about Java\'s Scanner class and now I\'m wondering how it compares/competes with the StringTokenizer and String.Split. I know that the StringTokenizer and Str
For the default scenarios I would suggest Pattern.split() as well but if you need maximum performance (especially on Android all solutions I tested are quite slow) and you only need to split by a single char, I now use my own method:
public static ArrayList<String> splitBySingleChar(final char[] s,
final char splitChar) {
final ArrayList<String> result = new ArrayList<String>();
final int length = s.length;
int offset = 0;
int count = 0;
for (int i = 0; i < length; i++) {
if (s[i] == splitChar) {
if (count > 0) {
result.add(new String(s, offset, count));
}
offset = i + 1;
count = 0;
} else {
count++;
}
}
if (count > 0) {
result.add(new String(s, offset, count));
}
return result;
}
Use "abc".toCharArray() to get the char array for a String. For example:
String s = " a bb ccc ffffdd eeeee ffffff ggggggg ";
ArrayList<String> result = splitBySingleChar(s.toCharArray(), ' ');
String.split() works very good but has its own boundaries, like if you wanted to split a string as shown below based on single or double pipe (|) symbol, it doesn't work. In this situation you can use StringTokenizer.
ABC|IJK
I recently did some experiments about the bad performance of String.split() in highly performance sensitive situations. You may find this useful.
http://eblog.chrononsystems.com/hidden-evils-of-javas-stringsplit-and-stringr
The gist is that String.split() compiles a Regular Expression pattern each time and can thus slow down your program, compared to if you use a precompiled Pattern object and use it directly to operate on a String.
StringTokenizer was always there. It is the fastest of all, but the enumeration-like idiom might not look as elegant as the others.
split came to existence on JDK 1.4. Slower than tokenizer but easier to use, since it is callable from the String class.
Scanner came to be on JDK 1.5. It is the most flexible and fills a long standing gap on the Java API to support an equivalent of the famous Cs scanf function family.
Split is slow, but not as slow as Scanner. StringTokenizer is faster than split. However, I found that I could obtain double the speed, by trading some flexibility, to get a speed-boost, which I did at JFastParser https://github.com/hughperkins/jfastparser
Testing on a string containing one million doubles:
Scanner: 10642 ms
Split: 715 ms
StringTokenizer: 544ms
JFastParser: 290ms
If you have a String object you want to tokenize, favor using String's split method over a StringTokenizer. If you're parsing text data from a source outside your program, like from a file, or from the user, that's where a Scanner comes in handy.