I need to split a string on any of the following sequences:
1 or more spaces
0 or more spaces, followed by a comma, followed by 0 or more spaces,
0 or more space
For simplicity, I'm going to interpret you indication of "space" () as "any whitespace" (
\s
).
Translating your spec more or less "word for word" is to delimit on any of:
\s+
\s*
), followed by a comma (,
), followed by 0 or more spaces (\s*
)
\s*,\s*
\s*
), followed by a "=>" (=>
), followed by 0 or more spaces (\s*
)
\s*=>\s*
To match any of the above: (\s+|\s*,\s*|\s*=>\s*)
However, your spec can be "reduced" to:
\s*
,(\s|,|=>)
\s*
Put it all together: \s*(\s|,|=>)\s*
The reduced form gets around some corner cases with the strictly translated form that makes some unexpected empty "matches".
Here's some code:
import java.util.regex.Pattern;
public class Temp {
// Strictly translated form:
//private static final String REGEX = "(\\s+|\\s*,\\s*|\\s*=>\\s*)";
// "Reduced" form:
private static final String REGEX = "\\s*(\\s|=>|,)\\s*";
private static final String INPUT =
"one two,three=>four , five six => seven,=>";
public static void main(final String[] args) {
final Pattern p = Pattern.compile(REGEX);
final String[] items = p.split(INPUT);
// Shorthand for above:
// final String[] items = INPUT.split(REGEX);
for(final String s : items) {
System.out.println("Match: '"+s+"'");
}
}
}
Output:
Match: 'one'
Match: 'two'
Match: 'three'
Match: 'four'
Match: 'five'
Match: 'six'
Match: 'seven'