How to split a string, but also keep the delimiters?

前端 未结 23 2357
我在风中等你
我在风中等你 2020-11-21 06:32

I have a multiline string which is delimited by a set of different delimiters:

(Text1)(DelimiterA)(Text2)(DelimiterC)(Text3)(DelimiterB)(Text4)
23条回答
  •  时光说笑
    2020-11-21 06:47

    One of the subtleties in this question involves the "leading delimiter" question: if you are going to have a combined array of tokens and delimiters you have to know whether it starts with a token or a delimiter. You could of course just assume that a leading delim should be discarded but this seems an unjustified assumption. You might also want to know whether you have a trailing delim or not. This sets two boolean flags accordingly.

    Written in Groovy but a Java version should be fairly obvious:

                String tokenRegex = /[\p{L}\p{N}]+/ // a String in Groovy, Unicode alphanumeric
                def finder = phraseForTokenising =~ tokenRegex
                // NB in Groovy the variable 'finder' is then of class java.util.regex.Matcher
                def finderIt = finder.iterator() // extra method added to Matcher by Groovy magic
                int start = 0
                boolean leadingDelim, trailingDelim
                def combinedTokensAndDelims = [] // create an array in Groovy
    
                while( finderIt.hasNext() )
                {
                    def token = finderIt.next()
                    int finderStart = finder.start()
                    String delim = phraseForTokenising[ start  .. finderStart - 1 ]
                    // Groovy: above gets slice of String/array
                    if( start == 0 ) leadingDelim = finderStart != 0
                    if( start > 0 || leadingDelim ) combinedTokensAndDelims << delim
                    combinedTokensAndDelims << token // add element to end of array
                    start = finder.end()
                }
                // start == 0 indicates no tokens found
                if( start > 0 ) {
                    // finish by seeing whether there is a trailing delim
                    trailingDelim = start < phraseForTokenising.length()
                    if( trailingDelim ) combinedTokensAndDelims << phraseForTokenising[ start .. -1 ]
    
                    println( "leading delim? $leadingDelim, trailing delim? $trailingDelim, combined array:\n $combinedTokensAndDelims" )
    
                }
    

提交回复
热议问题