Scanner vs. StringTokenizer vs. String.Split

前端 未结 10 1190
太阳男子
太阳男子 2020-11-22 10:56

I just learned about Java\'s Scanner class and now I\'m wondering how it compares/competes with the StringTokenizer and String.Split. I know that the StringTokenizer and Str

相关标签:
10条回答
  • 2020-11-22 11:37

    For the default scenarios I would suggest Pattern.split() as well but if you need maximum performance (especially on Android all solutions I tested are quite slow) and you only need to split by a single char, I now use my own method:

    public static ArrayList<String> splitBySingleChar(final char[] s,
            final char splitChar) {
        final ArrayList<String> result = new ArrayList<String>();
        final int length = s.length;
        int offset = 0;
        int count = 0;
        for (int i = 0; i < length; i++) {
            if (s[i] == splitChar) {
                if (count > 0) {
                    result.add(new String(s, offset, count));
                }
                offset = i + 1;
                count = 0;
            } else {
                count++;
            }
        }
        if (count > 0) {
            result.add(new String(s, offset, count));
        }
        return result;
    }
    

    Use "abc".toCharArray() to get the char array for a String. For example:

    String s = "     a bb   ccc  ffffdd eeeee  ffffff    ggggggg ";
    ArrayList<String> result = splitBySingleChar(s.toCharArray(), ' ');
    
    0 讨论(0)
  • 2020-11-22 11:41

    String.split() works very good but has its own boundaries, like if you wanted to split a string as shown below based on single or double pipe (|) symbol, it doesn't work. In this situation you can use StringTokenizer.

    ABC|IJK

    0 讨论(0)
  • 2020-11-22 11:44

    I recently did some experiments about the bad performance of String.split() in highly performance sensitive situations. You may find this useful.

    http://eblog.chrononsystems.com/hidden-evils-of-javas-stringsplit-and-stringr

    The gist is that String.split() compiles a Regular Expression pattern each time and can thus slow down your program, compared to if you use a precompiled Pattern object and use it directly to operate on a String.

    0 讨论(0)
  • 2020-11-22 11:48

    StringTokenizer was always there. It is the fastest of all, but the enumeration-like idiom might not look as elegant as the others.

    split came to existence on JDK 1.4. Slower than tokenizer but easier to use, since it is callable from the String class.

    Scanner came to be on JDK 1.5. It is the most flexible and fills a long standing gap on the Java API to support an equivalent of the famous Cs scanf function family.

    0 讨论(0)
  • 2020-11-22 11:49

    Split is slow, but not as slow as Scanner. StringTokenizer is faster than split. However, I found that I could obtain double the speed, by trading some flexibility, to get a speed-boost, which I did at JFastParser https://github.com/hughperkins/jfastparser

    Testing on a string containing one million doubles:

    Scanner: 10642 ms
    Split: 715 ms
    StringTokenizer: 544ms
    JFastParser: 290ms
    
    0 讨论(0)
  • 2020-11-22 11:51

    If you have a String object you want to tokenize, favor using String's split method over a StringTokenizer. If you're parsing text data from a source outside your program, like from a file, or from the user, that's where a Scanner comes in handy.

    0 讨论(0)
提交回复
热议问题