Java Regex Help: Splitting String on spaces, “=>”, and commas

前端 未结 3 1001
鱼传尺愫
鱼传尺愫 2021-02-05 21:48

I need to split a string on any of the following sequences:

1 or more spaces
0 or more spaces, followed by a comma, followed by 0 or more spaces,
0 or more space

3条回答
  •  不思量自难忘°
    2021-02-05 22:04

    Strictly translated

    For simplicity, I'm going to interpret you indication of "space" () as "any whitespace" (\s).

    Translating your spec more or less "word for word" is to delimit on any of:

    • 1 or more spaces
      • \s+
    • 0 or more spaces (\s*), followed by a comma (,), followed by 0 or more spaces (\s*)
      • \s*,\s*
    • 0 or more spaces (\s*), followed by a "=>" (=>), followed by 0 or more spaces (\s*)
      • \s*=>\s*

    To match any of the above: (\s+|\s*,\s*|\s*=>\s*)

    Reduced form

    However, your spec can be "reduced" to:

    • 0 or more spaces
      • \s*,
    • followed by either a space, comma, or "=>"
      • (\s|,|=>)
    • followed by 0 or more spaces
      • \s*

    Put it all together: \s*(\s|,|=>)\s*

    The reduced form gets around some corner cases with the strictly translated form that makes some unexpected empty "matches".

    Code

    Here's some code:

    import java.util.regex.Pattern;
    
    public class Temp {
    
        // Strictly translated form:
        //private static final String REGEX = "(\\s+|\\s*,\\s*|\\s*=>\\s*)";
    
        // "Reduced" form:
        private static final String REGEX = "\\s*(\\s|=>|,)\\s*";
    
        private static final String INPUT =
            "one two,three=>four , five   six   => seven,=>";
    
        public static void main(final String[] args) {
            final Pattern p = Pattern.compile(REGEX);
            final String[] items = p.split(INPUT);
            // Shorthand for above:
            // final String[] items = INPUT.split(REGEX);
            for(final String s : items) {
                System.out.println("Match: '"+s+"'");
            }
        }
    }
    

    Output:

    Match: 'one'
    Match: 'two'
    Match: 'three'
    Match: 'four'
    Match: 'five'
    Match: 'six'
    Match: 'seven'
    

提交回复
热议问题