How to split a string, but also keep the delimiters?

前端 未结 23 2311
我在风中等你
我在风中等你 2020-11-21 06:32

I have a multiline string which is delimited by a set of different delimiters:

(Text1)(DelimiterA)(Text2)(DelimiterC)(Text3)(DelimiterB)(Text4)
相关标签:
23条回答
  • 2020-11-21 06:57

    You can use Lookahead and Lookbehind. Like this:

    System.out.println(Arrays.toString("a;b;c;d".split("(?<=;)")));
    System.out.println(Arrays.toString("a;b;c;d".split("(?=;)")));
    System.out.println(Arrays.toString("a;b;c;d".split("((?<=;)|(?=;))")));

    And you will get:

    [a;, b;, c;, d]
    [a, ;b, ;c, ;d]
    [a, ;, b, ;, c, ;, d]

    The last one is what you want.

    ((?<=;)|(?=;)) equals to select an empty character before ; or after ;.

    Hope this helps.

    EDIT Fabian Steeg comments on Readability is valid. Readability is always the problem for RegEx. One thing, I do to help easing this is to create a variable whose name represent what the regex does and use Java String format to help that. Like this:

    static public final String WITH_DELIMITER = "((?<=%1$s)|(?=%1$s))";
    ...
    public void someMethod() {
    ...
    final String[] aEach = "a;b;c;d".split(String.format(WITH_DELIMITER, ";"));
    ...
    }
    ...
    

    This helps a little bit. :-D

    0 讨论(0)
  • 2020-11-21 06:57

    I had a look at the above answers and honestly none of them I find satisfactory. What you want to do is essentially mimic the Perl split functionality. Why Java doesn't allow this and have a join() method somewhere is beyond me but I digress. You don't even need a class for this really. Its just a function. Run this sample program:

    Some of the earlier answers have excessive null-checking, which I recently wrote a response to a question here:

    https://stackoverflow.com/users/18393/cletus

    Anyway, the code:

    public class Split {
        public static List<String> split(String s, String pattern) {
            assert s != null;
            assert pattern != null;
            return split(s, Pattern.compile(pattern));
        }
    
        public static List<String> split(String s, Pattern pattern) {
            assert s != null;
            assert pattern != null;
            Matcher m = pattern.matcher(s);
            List<String> ret = new ArrayList<String>();
            int start = 0;
            while (m.find()) {
                ret.add(s.substring(start, m.start()));
                ret.add(m.group());
                start = m.end();
            }
            ret.add(start >= s.length() ? "" : s.substring(start));
            return ret;
        }
    
        private static void testSplit(String s, String pattern) {
            System.out.printf("Splitting '%s' with pattern '%s'%n", s, pattern);
            List<String> tokens = split(s, pattern);
            System.out.printf("Found %d matches%n", tokens.size());
            int i = 0;
            for (String token : tokens) {
                System.out.printf("  %d/%d: '%s'%n", ++i, tokens.size(), token);
            }
            System.out.println();
        }
    
        public static void main(String args[]) {
            testSplit("abcdefghij", "z"); // "abcdefghij"
            testSplit("abcdefghij", "f"); // "abcde", "f", "ghi"
            testSplit("abcdefghij", "j"); // "abcdefghi", "j", ""
            testSplit("abcdefghij", "a"); // "", "a", "bcdefghij"
            testSplit("abcdefghij", "[bdfh]"); // "a", "b", "c", "d", "e", "f", "g", "h", "ij"
        }
    }
    
    0 讨论(0)
  • 2020-11-21 07:02

    If you can afford, use Java's replace(CharSequence target, CharSequence replacement) method and fill in another delimiter to split with. Example: I want to split the string "boo:and:foo" and keep ':' at its righthand String.

    String str = "boo:and:foo";
    str = str.replace(":","newdelimiter:");
    String[] tokens = str.split("newdelimiter");
    

    Important note: This only works if you have no further "newdelimiter" in your String! Thus, it is not a general solution. But if you know a CharSequence of which you can be sure that it will never appear in the String, this is a very simple solution.

    0 讨论(0)
  • 2020-11-21 07:06

    Here is a simple clean implementation which is consistent with Pattern#split and works with variable length patterns, which look behind cannot support, and it is easier to use. It is similar to the solution provided by @cletus.

    public static String[] split(CharSequence input, String pattern) {
        return split(input, Pattern.compile(pattern));
    }
    
    public static String[] split(CharSequence input, Pattern pattern) {
        Matcher matcher = pattern.matcher(input);
        int start = 0;
        List<String> result = new ArrayList<>();
        while (matcher.find()) {
            result.add(input.subSequence(start, matcher.start()).toString());
            result.add(matcher.group());
            start = matcher.end();
        }
        if (start != input.length()) result.add(input.subSequence(start, input.length()).toString());
        return result.toArray(new String[0]);
    }
    

    I don't do null checks here, Pattern#split doesn't, why should I. I don't like the if at the end but it is required for consistency with the Pattern#split . Otherwise I would unconditionally append, resulting in an empty string as the last element of the result if the input string ends with the pattern.

    I convert to String[] for consistency with Pattern#split, I use new String[0] rather than new String[result.size()], see here for why.

    Here are my tests:

    @Test
    public void splitsVariableLengthPattern() {
        String[] result = Split.split("/foo/$bar/bas", "\\$\\w+");
        Assert.assertArrayEquals(new String[] { "/foo/", "$bar", "/bas" }, result);
    }
    
    @Test
    public void splitsEndingWithPattern() {
        String[] result = Split.split("/foo/$bar", "\\$\\w+");
        Assert.assertArrayEquals(new String[] { "/foo/", "$bar" }, result);
    }
    
    @Test
    public void splitsStartingWithPattern() {
        String[] result = Split.split("$foo/bar", "\\$\\w+");
        Assert.assertArrayEquals(new String[] { "", "$foo", "/bar" }, result);
    }
    
    @Test
    public void splitsNoMatchesPattern() {
        String[] result = Split.split("/foo/bar", "\\$\\w+");
        Assert.assertArrayEquals(new String[] { "/foo/bar" }, result);
    }
    
    0 讨论(0)
  • 2020-11-21 07:09

    I suggest using Pattern and Matcher, which will almost certainly achieve what you want. Your regular expression will need to be somewhat more complicated than what you are using in String.split.

    0 讨论(0)
提交回复
热议问题