Split string delimited by comma without respect to commas in brackets

后端 未结 3 1922
情话喂你
情话喂你 2020-12-11 22:36

I\'ve got a string like

s=\"abc, 3rncd (23uh, sdfuh), 32h(q23q)89 (as), dwe8h, edt (1,wer,345,rtz,tr t), nope\";

and I want to split it int

相关标签:
3条回答
  • 2020-12-11 22:59

    Assuming ( and ) are not nested and unescaped. You can use split using:

    String[] arr = input.split(",(?![^()]*\\))\\s*");
    

    RegEx Demo

    ,(?![^()]*\)) will match a comma if it is NOT followed by a non-parentheses text and ), thus ignoring commas inside ( and ).

    0 讨论(0)
  • 2020-12-11 23:15

    Even this will work for you.

    public static void main(String[] args) {
        String s="abc, 3rncd (23uh, sdfuh), 32h(q23q)89 (as), dwe8h, edt (1,wer,345,rtz,tr t), nope";
        String[] arr = s.split(",\\s(?!\\w+\\))");
        for (String str : arr) {
            System.out.println(str);
        }
    }
    

    O/P :

    abc
    3rncd (23uh, sdfuh)
    32h(q23q)89 (as)
    dwe8h
    edt (1,wer,345,rtz,tr t)
    nope
    
    0 讨论(0)
  • 2020-12-11 23:18

    FWIW: I wouldn't use the lookahead solution for this.

    If you have a lot of commas, the lookahead will have a latency that is
    logarithmic, relative to the amount of commas.

    The reason is that a lookahead used like this can be open ended.
    If there is a posibility that there could be nothing to terminating the lookaead
    it's not a good idea. Especially on a large sample of data.

    Every time the regex finds a comma, it has to do this (?![^()]*\))

    What that does is lookahead until it finds parenthesis.
    That means it will match comma's as well.

    If you have a string like this asdf,asdf,asdf,aasdf,aaaasdf,asdf,aasdf,asdf
    the progression is

    Match 1: found , looked ahead at all of this asdf,asdf,aasdf,aaaasdf,asdf,aasdf,asdf
    Match 2: found , looked ahead at all of this asdf,aasdf,aaaasdf,asdf,aasdf,asdf
    Match 3: found , looked ahead at all of this aasdf,aaaasdf,asdf,aasdf,asdf
    Match 4: found , looked ahead at all of this aaaasdf,asdf,aasdf,asdf
    Match 5: found , looked ahead at all of this asdf,aasdf,asdf
    Match 6: found , looked ahead at all of this aasdf,asdf
    Match 7: found , looked ahead at all of this asdf

    It's a pretty small string to be matching all of that stuff.

    It's never good to use a regex like that, for split or any kind of matching.


    I'd just match the field values in a global find.

    "(?:\\A|\\G,\\s*)([^(),]*(?:(?:\\([^()]*\\))[^(),]*)*)"  
    

    Here is a simple benchmark that demonstrates the said latency using
    a lookahead like this can cause:

    Sample: 260 characters, 42 commas

    asdf,asdf,asdf,asdf,asdf,asdf,asdf,
    asdf,asdf,asdf,asdf,asdf,asdf,asdf,
    asdf,asdf,asdf,asdf,asdf,asdf,asdf,
    asdf,asdf,asdf,asdf,asdf,asdf,asdf,
    asdf,asdf,asdf,asdf,asdf,asdf,asdf,
    asdf,asdf,asdf,asdf,asdf,asdf,asdf,
    asdf,asdf,asdf,asdf,asdf,asdf,asdf,
    

    Benchmark

    Regex1:   (?:\A|\G,\s*)([^(),]*(?:(?:\([^()]*\))[^(),]*)*)
    Options:  < none >
    Completed iterations:   50  /  50     ( x 1000 )
    Matches found per iteration:   50
    Elapsed Time:    2.97 s,   2972.45 ms,   2972454 µs
    
    
    Regex2:   ,(?![^()]*\))\s*
    Options:  < none >
    Completed iterations:   50  /  50     ( x 1000 )
    Matches found per iteration:   49
    Elapsed Time:    21.59 s,   21586.81 ms,   21586811 µs
    

    When the sample is doubled, the time gets ever worse..

    Regex1:   (?:\A|\G,\s*)([^(),]*(?:(?:\([^()]*\))[^(),]*)*)
    Options:  < none >
    Completed iterations:   50  /  50     ( x 1000 )
    Matches found per iteration:   99
    Elapsed Time:    5.89 s,   5887.16 ms,   5887163 µs
    
    
    Regex2:   ,(?![^()]*\))\s*
    Options:  < none >
    Completed iterations:   50  /  50     ( x 1000 )
    Matches found per iteration:   98
    Elapsed Time:    83.06 s,   83063.77 ms,   83063772 µs
    
    0 讨论(0)
提交回复
热议问题