Split string to equal length substrings in Java

后端 未结 21 1860
日久生厌
日久生厌 2020-11-22 02:56

How to split the string \"Thequickbrownfoxjumps\" to substrings of equal size in Java. Eg. \"Thequickbrownfoxjumps\" of 4 equal size should give th

相关标签:
21条回答
  • 2020-11-22 03:20

    I asked @Alan Moore in a comment to the accepted solution how strings with newlines could be handled. He suggested using DOTALL.

    Using his suggestion I created a small sample of how that works:

    public void regexDotAllExample() throws UnsupportedEncodingException {
        final String input = "The\nquick\nbrown\r\nfox\rjumps";
        final String regex = "(?<=\\G.{4})";
    
        Pattern splitByLengthPattern;
        String[] split;
    
        splitByLengthPattern = Pattern.compile(regex);
        split = splitByLengthPattern.split(input);
        System.out.println("---- Without DOTALL ----");
        for (int i = 0; i < split.length; i++) {
            byte[] s = split[i].getBytes("utf-8");
            System.out.println("[Idx: "+i+", length: "+s.length+"] - " + s);
        }
        /* Output is a single entry longer than the desired split size:
        ---- Without DOTALL ----
        [Idx: 0, length: 26] - [B@17cdc4a5
         */
    
    
        //DOTALL suggested in Alan Moores comment on SO: https://stackoverflow.com/a/3761521/1237974
        splitByLengthPattern = Pattern.compile(regex, Pattern.DOTALL);
        split = splitByLengthPattern.split(input);
        System.out.println("---- With DOTALL ----");
        for (int i = 0; i < split.length; i++) {
            byte[] s = split[i].getBytes("utf-8");
            System.out.println("[Idx: "+i+", length: "+s.length+"] - " + s);
        }
        /* Output is as desired 7 entries with each entry having a max length of 4:
        ---- With DOTALL ----
        [Idx: 0, length: 4] - [B@77b22abc
        [Idx: 1, length: 4] - [B@5213da08
        [Idx: 2, length: 4] - [B@154f6d51
        [Idx: 3, length: 4] - [B@1191ebc5
        [Idx: 4, length: 4] - [B@30ddb86
        [Idx: 5, length: 4] - [B@2c73bfb
        [Idx: 6, length: 2] - [B@6632dd29
         */
    
    }
    

    But I like @Jon Skeets solution in https://stackoverflow.com/a/3760193/1237974 also. For maintainability in larger projects where not everyone are equally experienced in Regular expressions I would probably use Jons solution.

    0 讨论(0)
  • 2020-11-22 03:21

    In case you want to split the string equally backwards, i.e. from right to left, for example, to split 1010001111 to [10, 1000, 1111], here's the code:

    /**
     * @param s         the string to be split
     * @param subLen    length of the equal-length substrings.
     * @param backwards true if the splitting is from right to left, false otherwise
     * @return an array of equal-length substrings
     * @throws ArithmeticException: / by zero when subLen == 0
     */
    public static String[] split(String s, int subLen, boolean backwards) {
        assert s != null;
        int groups = s.length() % subLen == 0 ? s.length() / subLen : s.length() / subLen + 1;
        String[] strs = new String[groups];
        if (backwards) {
            for (int i = 0; i < groups; i++) {
                int beginIndex = s.length() - subLen * (i + 1);
                int endIndex = beginIndex + subLen;
                if (beginIndex < 0)
                    beginIndex = 0;
                strs[groups - i - 1] = s.substring(beginIndex, endIndex);
            }
        } else {
            for (int i = 0; i < groups; i++) {
                int beginIndex = subLen * i;
                int endIndex = beginIndex + subLen;
                if (endIndex > s.length())
                    endIndex = s.length();
                strs[i] = s.substring(beginIndex, endIndex);
            }
        }
        return strs;
    }
    
    0 讨论(0)
  • 2020-11-22 03:23
    public String[] splitInParts(String s, int partLength)
    {
        int len = s.length();
    
        // Number of parts
        int nparts = (len + partLength - 1) / partLength;
        String parts[] = new String[nparts];
    
        // Break into parts
        int offset= 0;
        int i = 0;
        while (i < nparts)
        {
            parts[i] = s.substring(offset, Math.min(offset + partLength, len));
            offset += partLength;
            i++;
        }
    
        return parts;
    }
    
    0 讨论(0)
  • 2020-11-22 03:24
    @Test
    public void regexSplit() {
        String source = "Thequickbrownfoxjumps";
        // define matcher, any char, min length 1, max length 4
        Matcher matcher = Pattern.compile(".{1,4}").matcher(source);
        List<String> result = new ArrayList<>();
        while (matcher.find()) {
            result.add(source.substring(matcher.start(), matcher.end()));
        }
        String[] expected = {"Theq", "uick", "brow", "nfox", "jump", "s"};
        assertArrayEquals(result.toArray(), expected);
    }
    
    0 讨论(0)
  • 2020-11-22 03:25

    Here's the regex one-liner version:

    System.out.println(Arrays.toString(
        "Thequickbrownfoxjumps".split("(?<=\\G.{4})")
    ));
    

    \G is a zero-width assertion that matches the position where the previous match ended. If there was no previous match, it matches the beginning of the input, the same as \A. The enclosing lookbehind matches the position that's four characters along from the end of the last match.

    Both lookbehind and \G are advanced regex features, not supported by all flavors. Furthermore, \G is not implemented consistently across the flavors that do support it. This trick will work (for example) in Java, Perl, .NET and JGSoft, but not in PHP (PCRE), Ruby 1.9+ or TextMate (both Oniguruma). JavaScript's /y (sticky flag) isn't as flexible as \G, and couldn't be used this way even if JS did support lookbehind.

    I should mention that I don't necessarily recommend this solution if you have other options. The non-regex solutions in the other answers may be longer, but they're also self-documenting; this one's just about the opposite of that. ;)

    Also, this doesn't work in Android, which doesn't support the use of \G in lookbehinds.

    0 讨论(0)
  • 2020-11-22 03:25

    If you're using Google's guava general-purpose libraries (and quite honestly, any new Java project probably should be), this is insanely trivial with the Splitter class:

    for (String substring : Splitter.fixedLength(4).split(inputString)) {
        doSomethingWith(substring);
    }
    

    and that's it. Easy as!

    0 讨论(0)
提交回复
热议问题