How to split a string, but also keep the delimiters?

前端 未结 23 2287
我在风中等你
我在风中等你 2020-11-21 06:32

I have a multiline string which is delimited by a set of different delimiters:

(Text1)(DelimiterA)(Text2)(DelimiterC)(Text3)(DelimiterB)(Text4)
相关标签:
23条回答
  • 2020-11-21 06:42

    Fast answer: use non physical bounds like \b to split. I will try and experiment to see if it works (used that in PHP and JS).

    It is possible, and kind of work, but might split too much. Actually, it depends on the string you want to split and the result you need. Give more details, we will help you better.

    Another way is to do your own split, capturing the delimiter (supposing it is variable) and adding it afterward to the result.

    My quick test:

    String str = "'ab','cd','eg'";
    String[] stra = str.split("\\b");
    for (String s : stra) System.out.print(s + "|");
    System.out.println();
    

    Result:

    '|ab|','|cd|','|eg|'|
    

    A bit too much... :-)

    0 讨论(0)
  • 2020-11-21 06:43

    Pass the 3rd aurgument as "true". It will return delimiters as well.

    StringTokenizer(String str, String delimiters, true);
    
    0 讨论(0)
  • 2020-11-21 06:44

    I know this is a very-very old question and answer has also been accepted. But still I would like to submit a very simple answer to original question. Consider this code:

    String str = "Hello-World:How\nAre You&doing";
    inputs = str.split("(?!^)\\b");
    for (int i=0; i<inputs.length; i++) {
       System.out.println("a[" + i + "] = \"" + inputs[i] + '"');
    }
    

    OUTPUT:

    a[0] = "Hello"
    a[1] = "-"
    a[2] = "World"
    a[3] = ":"
    a[4] = "How"
    a[5] = "
    "
    a[6] = "Are"
    a[7] = " "
    a[8] = "You"
    a[9] = "&"
    a[10] = "doing"
    

    I am just using word boundary \b to delimit the words except when it is start of text.

    0 讨论(0)
  • 2020-11-21 06:44

    I got here late, but returning to the original question, why not just use lookarounds?

    Pattern p = Pattern.compile("(?<=\\w)(?=\\W)|(?<=\\W)(?=\\w)");
    System.out.println(Arrays.toString(p.split("'ab','cd','eg'")));
    System.out.println(Arrays.toString(p.split("boo:and:foo")));
    

    output:

    [', ab, ',', cd, ',', eg, ']
    [boo, :, and, :, foo]
    

    EDIT: What you see above is what appears on the command line when I run that code, but I now see that it's a bit confusing. It's difficult to keep track of which commas are part of the result and which were added by Arrays.toString(). SO's syntax highlighting isn't helping either. In hopes of getting the highlighting to work with me instead of against me, here's how those arrays would look it I were declaring them in source code:

    { "'", "ab", "','", "cd", "','", "eg", "'" }
    { "boo", ":", "and", ":", "foo" }
    

    I hope that's easier to read. Thanks for the heads-up, @finnw.

    0 讨论(0)
  • 2020-11-21 06:44

    I don't know of an existing function in the Java API that does this (which is not to say it doesn't exist), but here's my own implementation (one or more delimiters will be returned as a single token; if you want each delimiter to be returned as a separate token, it will need a bit of adaptation):

    static String[] splitWithDelimiters(String s) {
        if (s == null || s.length() == 0) {
            return new String[0];
        }
        LinkedList<String> result = new LinkedList<String>();
        StringBuilder sb = null;
        boolean wasLetterOrDigit = !Character.isLetterOrDigit(s.charAt(0));
        for (char c : s.toCharArray()) {
            if (Character.isLetterOrDigit(c) ^ wasLetterOrDigit) {
                if (sb != null) {
                    result.add(sb.toString());
                }
                sb = new StringBuilder();
                wasLetterOrDigit = !wasLetterOrDigit;
            }
            sb.append(c);
        }
        result.add(sb.toString());
        return result.toArray(new String[0]);
    }
    
    0 讨论(0)
  • 2020-11-21 06:46

    Tweaked Pattern.split() to include matched pattern to the list

    Added

    // add match to the list
            matchList.add(input.subSequence(start, end).toString());
    

    Full source

    public static String[] inclusiveSplit(String input, String re, int limit) {
        int index = 0;
        boolean matchLimited = limit > 0;
        ArrayList<String> matchList = new ArrayList<String>();
    
        Pattern pattern = Pattern.compile(re);
        Matcher m = pattern.matcher(input);
    
        // Add segments before each match found
        while (m.find()) {
            int end = m.end();
            if (!matchLimited || matchList.size() < limit - 1) {
                int start = m.start();
                String match = input.subSequence(index, start).toString();
                matchList.add(match);
                // add match to the list
                matchList.add(input.subSequence(start, end).toString());
                index = end;
            } else if (matchList.size() == limit - 1) { // last one
                String match = input.subSequence(index, input.length())
                        .toString();
                matchList.add(match);
                index = end;
            }
        }
    
        // If no match was found, return this
        if (index == 0)
            return new String[] { input.toString() };
    
        // Add remaining segment
        if (!matchLimited || matchList.size() < limit)
            matchList.add(input.subSequence(index, input.length()).toString());
    
        // Construct result
        int resultSize = matchList.size();
        if (limit == 0)
            while (resultSize > 0 && matchList.get(resultSize - 1).equals(""))
                resultSize--;
        String[] result = new String[resultSize];
        return matchList.subList(0, resultSize).toArray(result);
    }
    
    0 讨论(0)
提交回复
热议问题