Priority in regex manipulating

南笙酒味 提交于 2019-12-20 06:18:33

问题


I write some java code to split string into array of string. First, I split that string using regex pattern "\\,\\,|\\," and then I split using pattern "\\,|\\,\\,". Why there are difference between output of the first and output of the second?

public class Test2 {
    public static void main(String[] args){

        String regex1 = "\\,\\,|\\,";
        String regex2 = "\\,|\\,\\,"; 

        String a  = "20140608,FT141590Z0LL,0608103611018634TCKJ3301000000018667,3000054789,IDR1742630000001,80507,1000,6012,TCKJ3301,6.00E+12,ID0010015,WADORI PURWANTO,,3000054789";
        String ss[] = a.split(regex1); 

        int index = 0; 
        for(String m : ss){
            System.out.println((index++)+ ": "+m+"|"); 
        }
    }
} 

Output when using regex1:

0: 20140608|
1: FT141590Z0LL|
2: 0608103611018634TCKJ3301000000018667|
3: 3000054789|
4: IDR1742630000001|
5: 80507|
6: 1000|
7: 6012|
8: TCKJ3301|
9: 6.00E+12|
10: ID0010015|
11: WADORI PURWANTO|
12: 3000054789|

And when using regex2:

0: 20140608|
1: FT141590Z0LL|
2: 0608103611018634TCKJ3301000000018667|
3: 3000054789|
4: IDR1742630000001|
5: 80507|
6: 1000|
7: 6012|
8: TCKJ3301|
9: 6.00E+12|
10: ID0010015|
11: WADORI PURWANTO|
12: |
13: 3000054789|

I need some explanation of how regex engine works when handling this situation.


回答1:


How regex works: The state machine always reads from left to right. ,|,, == ,, as it always will only be matched to the first alternation:


(source: gyazo.com)

,,|, == ,,?:


(source: gyazo.com)


However, you should use ,,? instead so there's no backtracking:


(source: gyazo.com)




回答2:


Seeing the two results, it seems that the split method try to find the first expression at first ("," for regex2, ",," for regex1) and split the string, and then the second one, but after the first pass with regex2 there isn't a single "," left in the strings. That's why there is an empty string detected when ",," is read with regex2.

So for your regex to be useful, you need to write the more complex expression first.




回答3:


It will be evaluated from left to right. In regex1, \\,\\, is tried first, otherwise \\, is tried. That's why 12th String is not empty, because \\,\\, is matched in that case. For regex2, everything is matched using \\,, hence the empty String.




回答4:


Case 1: Split by ,, else ,
This gets only first case, the rest split by ,.

Case 2: Split by , else ,,
gets all cases. So ,, gets split into word and ,word.
Then ,word gets split into " " and word.



来源:https://stackoverflow.com/questions/25179366/priority-in-regex-manipulating

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!