I am attempting to write a spoiler identification system so that any spoilers in a string are replaced with a specified spoiler character.
I want to match a string s
This solution will omit empty or whitespace only substrings
public static List<String> getStrsBetweenBalancedSubstrings(String s, Character markStart, Character markEnd) {
List<String> subTreeList = new ArrayList<String>();
int level = 0;
int lastCloseBracket= 0;
for (int i = 0; i < s.length(); i++) {
char c = s.charAt(i);
if (c == markStart) {
level++;
if (level == 1 && i != 0 && i!=lastCloseBracket &&
!s.substring(lastCloseBracket, i).trim().isEmpty()) {
subTreeList.add(s.substring(lastCloseBracket, i).trim());
}
}
} else if (c == markEnd) {
if (level > 0) {
level--;
lastCloseBracket = i+1;
}
}
}
if (lastCloseBracket != s.length() && !s.substring(lastCloseBracket).trim().isEmpty()) {
subTreeList.add(s.substring(lastCloseBracket).trim());
}
return subTreeList;
}
Then, use it as
String input = "Jim ate a [sandwich][ooh] with [pickles] and [dried [onions]] and ] [an[other] match] and more here";
List<String> between_balanced = getStrsBetweenBalancedSubstrings(input, '[', ']');
System.out.println("Result: " + between_balanced);
// => Result: [Jim ate a, with, and, and ], and more here]
You can also extract all substrings inside balanced parentheses and then split with them:
String input = "Jim ate a [sandwich] with [pickles] and [dried [onions]] and ] [an[other] match]";
List<String> balanced = getBalancedSubstrings(input, '[', ']', true);
System.out.println("Balanced ones: " + balanced);
List<String> rx_split = new ArrayList<String>();
for (String item : balanced) {
rx_split.add("\\s*" + Pattern.quote(item) + "\\s*");
}
String rx = String.join("|", rx_split);
System.out.println("In-betweens: " + Arrays.toString(input.split(rx)));
And this function will find all []
-balanced substrings:
public static List<String> getBalancedSubstrings(String s, Character markStart,
Character markEnd, Boolean includeMarkers) {
List<String> subTreeList = new ArrayList<String>();
int level = 0;
int lastOpenBracket = -1;
for (int i = 0; i < s.length(); i++) {
char c = s.charAt(i);
if (c == markStart) {
level++;
if (level == 1) {
lastOpenBracket = (includeMarkers ? i : i + 1);
}
}
else if (c == markEnd) {
if (level == 1) {
subTreeList.add(s.substring(lastOpenBracket, (includeMarkers ? i + 1 : i)));
}
if (level > 0) level--;
}
}
return subTreeList;
}
See IDEONE demo
Result of the code execution:
Balanced ones: ['[sandwich], [pickles], [dried [onions]]', '[an[other] match]']
In-betweens: ['Jim ate a', 'with', 'and', 'and ]']
Credits: the getBalancedSubstrings
is based on the peter.murray.rust's answer for How to split this “Tree-like” string in Java regex? post.