Match contents within square brackets, including nested square brackets

前端 未结 1 1223
梦谈多话
梦谈多话 2021-01-07 02:23

I am attempting to write a spoiler identification system so that any spoilers in a string are replaced with a specified spoiler character.

I want to match a string s

相关标签:
1条回答
  • 2021-01-07 02:26

    More direct solution

    This solution will omit empty or whitespace only substrings

    public static List<String> getStrsBetweenBalancedSubstrings(String s, Character markStart, Character markEnd) {
        List<String> subTreeList = new ArrayList<String>();
        int level = 0;
        int lastCloseBracket= 0;
        for (int i = 0; i < s.length(); i++) {
            char c = s.charAt(i);
                if (c == markStart) {
                        level++;
                        if (level == 1 && i != 0 && i!=lastCloseBracket &&
                            !s.substring(lastCloseBracket, i).trim().isEmpty()) {
                                subTreeList.add(s.substring(lastCloseBracket, i).trim());
                    }
                }
            } else if (c == markEnd) {
                if (level > 0) { 
                    level--;
                    lastCloseBracket = i+1;
                }
                }
        }
        if (lastCloseBracket != s.length() && !s.substring(lastCloseBracket).trim().isEmpty()) {
            subTreeList.add(s.substring(lastCloseBracket).trim());  
        }
        return subTreeList;
    }
    

    Then, use it as

    String input = "Jim ate a [sandwich][ooh] with [pickles] and [dried [onions]] and ] [an[other] match] and more here";
    List<String> between_balanced =  getStrsBetweenBalancedSubstrings(input, '[', ']');
    System.out.println("Result: " + between_balanced);
    // => Result: [Jim ate a, with, and, and ], and more here]
    

    Original answer (more complex, shows a way to extract nested parentheses)

    You can also extract all substrings inside balanced parentheses and then split with them:

    String input = "Jim ate a [sandwich] with [pickles] and [dried [onions]] and ] [an[other] match]";
    List<String> balanced = getBalancedSubstrings(input, '[', ']', true);
    System.out.println("Balanced ones: " + balanced);
    List<String> rx_split = new ArrayList<String>();
    for (String item : balanced) {
        rx_split.add("\\s*" + Pattern.quote(item) + "\\s*");
    }
    String rx = String.join("|", rx_split);
    System.out.println("In-betweens: " + Arrays.toString(input.split(rx)));
    

    And this function will find all []-balanced substrings:

    public static List<String> getBalancedSubstrings(String s, Character markStart, 
                                         Character markEnd, Boolean includeMarkers) {
        List<String> subTreeList = new ArrayList<String>();
        int level = 0;
        int lastOpenBracket = -1;
        for (int i = 0; i < s.length(); i++) {
            char c = s.charAt(i);
            if (c == markStart) {
                level++;
                if (level == 1) {
                    lastOpenBracket = (includeMarkers ? i : i + 1);
                }
            }
            else if (c == markEnd) {
                if (level == 1) {
                    subTreeList.add(s.substring(lastOpenBracket, (includeMarkers ? i + 1 : i)));
                }
                if (level > 0) level--;
            }
        }
        return subTreeList;
    }
    

    See IDEONE demo

    Result of the code execution:

    Balanced ones: ['[sandwich], [pickles], [dried [onions]]', '[an[other] match]']
    In-betweens: ['Jim ate a', 'with', 'and', 'and ]']
    

    Credits: the getBalancedSubstrings is based on the peter.murray.rust's answer for How to split this “Tree-like” string in Java regex? post.

    0 讨论(0)
提交回复
热议问题