Split string delimited by comma without respect to commas in brackets

后端未结

关注

 3  1922

I\'ve got a string like

s=\"abc, 3rncd (23uh, sdfuh), 32h(q23q)89 (as), dwe8h, edt (1,wer,345,rtz,tr t), nope\";

and I want to split it int

相关标签:

3条回答

北荒

2020-12-11 22:59
Assuming ( and ) are not nested and unescaped. You can use split using:
```
String[] arr = input.split(",(?![^()]*\\))\\s*");
```
RegEx Demo

,(?![^()]*\)) will match a comma if it is NOT followed by a non-parentheses text and ), thus ignoring commas inside ( and ).
0 讨论(0)
发布评论:

提交评论
- 加载中...

名媛妹妹

2020-12-11 23:15

Even this will work for you.

public static void main(String[] args) {
    String s="abc, 3rncd (23uh, sdfuh), 32h(q23q)89 (as), dwe8h, edt (1,wer,345,rtz,tr t), nope";
    String[] arr = s.split(",\\s(?!\\w+\\))");
    for (String str : arr) {
        System.out.println(str);
    }
}

O/P :

abc
3rncd (23uh, sdfuh)
32h(q23q)89 (as)
dwe8h
edt (1,wer,345,rtz,tr t)
nope

0 讨论(0)

花落未央

2020-12-11 23:18
FWIW: I wouldn't use the lookahead solution for this.

If you have a lot of commas, the lookahead will have a latency that is
logarithmic, relative to the amount of commas.

The reason is that a lookahead used like this can be open ended.
If there is a posibility that there could be nothing to terminating the lookaead
it's not a good idea. Especially on a large sample of data.

Every time the regex finds a comma, it has to do this (?![^()]*\))

What that does is lookahead until it finds parenthesis.
That means it will match comma's as well.

If you have a string like this asdf,asdf,asdf,aasdf,aaaasdf,asdf,aasdf,asdf
the progression is

Match 1: found , looked ahead at all of this asdf,asdf,aasdf,aaaasdf,asdf,aasdf,asdf
Match 2: found , looked ahead at all of this asdf,aasdf,aaaasdf,asdf,aasdf,asdf
Match 3: found , looked ahead at all of this aasdf,aaaasdf,asdf,aasdf,asdf
Match 4: found , looked ahead at all of this aaaasdf,asdf,aasdf,asdf
Match 5: found , looked ahead at all of this asdf,aasdf,asdf
Match 6: found , looked ahead at all of this aasdf,asdf
Match 7: found , looked ahead at all of this asdf

It's a pretty small string to be matching all of that stuff.

It's never good to use a regex like that, for split or any kind of matching.

I'd just match the field values in a global find.
```
"(?:\\A|\\G,\\s*)([^(),]*(?:(?:\\([^()]*\\))[^(),]*)*)"  
```
Here is a simple benchmark that demonstrates the said latency using
a lookahead like this can cause:

Sample: 260 characters, 42 commas
```
asdf,asdf,asdf,asdf,asdf,asdf,asdf,
asdf,asdf,asdf,asdf,asdf,asdf,asdf,
asdf,asdf,asdf,asdf,asdf,asdf,asdf,
asdf,asdf,asdf,asdf,asdf,asdf,asdf,
asdf,asdf,asdf,asdf,asdf,asdf,asdf,
asdf,asdf,asdf,asdf,asdf,asdf,asdf,
asdf,asdf,asdf,asdf,asdf,asdf,asdf,
```
Benchmark
```
Regex1:   (?:\A|\G,\s*)([^(),]*(?:(?:\([^()]*\))[^(),]*)*)
Options:  < none >
Completed iterations:   50  /  50     ( x 1000 )
Matches found per iteration:   50
Elapsed Time:    2.97 s,   2972.45 ms,   2972454 µs


Regex2:   ,(?![^()]*\))\s*
Options:  < none >
Completed iterations:   50  /  50     ( x 1000 )
Matches found per iteration:   49
Elapsed Time:    21.59 s,   21586.81 ms,   21586811 µs
```
When the sample is doubled, the time gets ever worse..
```
Regex1:   (?:\A|\G,\s*)([^(),]*(?:(?:\([^()]*\))[^(),]*)*)
Options:  < none >
Completed iterations:   50  /  50     ( x 1000 )
Matches found per iteration:   99
Elapsed Time:    5.89 s,   5887.16 ms,   5887163 µs


Regex2:   ,(?![^()]*\))\s*
Options:  < none >
Completed iterations:   50  /  50     ( x 1000 )
Matches found per iteration:   98
Elapsed Time:    83.06 s,   83063.77 ms,   83063772 µs
```
0 讨论(0)
发布评论:

提交评论
- 加载中...