What is the easiest/best/most correct way to iterate through the characters of a string in Java?

前端 未结 15 1218
挽巷
挽巷 2020-11-22 11:14

StringTokenizer? Convert the String to a char[] and iterate over that? Something else?

相关标签:
15条回答
  • 2020-11-22 11:30

    Two options

    for(int i = 0, n = s.length() ; i < n ; i++) { 
        char c = s.charAt(i); 
    }
    

    or

    for(char c : s.toCharArray()) {
        // process c
    }
    

    The first is probably faster, then 2nd is probably more readable.

    0 讨论(0)
  • 2020-11-22 11:32

    StringTokenizer is totally unsuited to the task of breaking a string into its individual characters. With String#split() you can do that easily by using a regex that matches nothing, e.g.:

    String[] theChars = str.split("|");
    

    But StringTokenizer doesn't use regexes, and there's no delimiter string you can specify that will match the nothing between characters. There is one cute little hack you can use to accomplish the same thing: use the string itself as the delimiter string (making every character in it a delimiter) and have it return the delimiters:

    StringTokenizer st = new StringTokenizer(str, str, true);
    

    However, I only mention these options for the purpose of dismissing them. Both techniques break the original string into one-character strings instead of char primitives, and both involve a great deal of overhead in the form of object creation and string manipulation. Compare that to calling charAt() in a for loop, which incurs virtually no overhead.

    0 讨论(0)
  • 2020-11-22 11:35

    See The Java Tutorials: Strings.

    public class StringDemo {
        public static void main(String[] args) {
            String palindrome = "Dot saw I was Tod";
            int len = palindrome.length();
            char[] tempCharArray = new char[len];
            char[] charArray = new char[len];
    
            // put original string in an array of chars
            for (int i = 0; i < len; i++) {
                tempCharArray[i] = palindrome.charAt(i);
            } 
    
            // reverse array of chars
            for (int j = 0; j < len; j++) {
                charArray[j] = tempCharArray[len - 1 - j];
            }
    
            String reversePalindrome =  new String(charArray);
            System.out.println(reversePalindrome);
        }
    }
    

    Put the length into int len and use for loop.

    0 讨论(0)
  • 2020-11-22 11:36

    If you need to iterate through the code points of a String (see this answer) a shorter / more readable way is to use the CharSequence#codePoints method added in Java 8:

    for(int c : string.codePoints().toArray()){
        ...
    }
    

    or using the stream directly instead of a for loop:

    string.codePoints().forEach(c -> ...);
    

    There is also CharSequence#chars if you want a stream of the characters (although it is an IntStream, since there is no CharStream).

    0 讨论(0)
  • 2020-11-22 11:37

    In Java 8 we can solve it as:

    String str = "xyz";
    str.chars().forEachOrdered(i -> System.out.print((char)i));
    str.codePoints().forEachOrdered(i -> System.out.print((char)i));
    

    The method chars() returns an IntStream as mentioned in doc:

    Returns a stream of int zero-extending the char values from this sequence. Any char which maps to a surrogate code point is passed through uninterpreted. If the sequence is mutated while the stream is being read, the result is undefined.

    The method codePoints() also returns an IntStream as per doc:

    Returns a stream of code point values from this sequence. Any surrogate pairs encountered in the sequence are combined as if by Character.toCodePoint and the result is passed to the stream. Any other code units, including ordinary BMP characters, unpaired surrogates, and undefined code units, are zero-extended to int values which are then passed to the stream.

    How is char and code point different? As mentioned in this article:

    Unicode 3.1 added supplementary characters, bringing the total number of characters to more than the 2^16 = 65536 characters that can be distinguished by a single 16-bit char. Therefore, a char value no longer has a one-to-one mapping to the fundamental semantic unit in Unicode. JDK 5 was updated to support the larger set of character values. Instead of changing the definition of the char type, some of the new supplementary characters are represented by a surrogate pair of two char values. To reduce naming confusion, a code point will be used to refer to the number that represents a particular Unicode character, including supplementary ones.

    Finally why forEachOrdered and not forEach ?

    The behaviour of forEach is explicitly nondeterministic where as the forEachOrdered performs an action for each element of this stream, in the encounter order of the stream if the stream has a defined encounter order. So forEach does not guarantee that the order would be kept. Also check this question for more.

    For difference between a character, a code point, a glyph and a grapheme check this question.

    0 讨论(0)
  • 2020-11-22 11:39

    I use a for loop to iterate the string and use charAt() to get each character to examine it. Since the String is implemented with an array, the charAt() method is a constant time operation.

    String s = "...stuff...";
    
    for (int i = 0; i < s.length(); i++){
        char c = s.charAt(i);        
        //Process char
    }
    

    That's what I would do. It seems the easiest to me.

    As far as correctness goes, I don't believe that exists here. It is all based on your personal style.

    0 讨论(0)
提交回复
热议问题