问题
I'm trying to make a regex string that extracts data from report files. The tricky part is that I need this single regex string to match multiple report file content formats. I want the regex to always match even if some optional groups are not found.
Take the following report files content (Note: #2 is missing the "val2" part.):
- File #1: "-val1-test-val2-result-val3-done-"
- Expected Result:
- Val1 Group: test
- Val2 Group: result
- Val3 Group: done
- File #2: "-val1-test-val3-done-"
- Expected Result:
- Val1 Group: test
- Val2 Group: (empty)
- Val3 Group: done
I tried the following regex strings :
Regex #1(Normal): "-val1-(?<val1>.+?)-val2-(?<val2>.+?)-val3-(?<val3>.+?)-"
Problem: File #1 works fine but on file #2, the regex is not matching so I don't have any group values.
Regex #2(Non greedy)): "-val1-(?<val1>.+?)(-val2-(?<val2>.+?))?-val3-(?<val3>.+?)-"
Regex #3(Boolean OR): "-val1-(?<val1>.+?)(-val2-(?<val2>.+?)|(.*?))-val3-(?<val3>.+?)-"
Regex #4(Conditionnal): "-val1-(?<val1>.+?)(?(-val2-(?<val2>.+?))|(.+?))-val3-(?<val3>.+?)-"
Regex #5(Conditionnal): "-val1-(?<val1>.+?)(?(-val2-(?<val2>.+?))(-val2-(?<val2>.+?)))-val3-(?<val3>.+?)-"
Regex #6(Conditionnal): "-val1-(?<val1>.+?)(?(-val2-(?<val2>.+?))(-val2-(?<val2>.+?))|(.+?))-val3-(?<val3>.+?)-"
Problem: File #2 works as expected but the val2 group of file #1 is always empty.
Conclusion: The behavior seems to be that even if an optional group is present, the regex will prioritize an empty group value over the present value. Is there a way to force getting the optional groups' value when they are present and only return (empty) when they're not?
Note: I'm using the latest .NET framework and the code will ported to Java(Android). I'm trying to avoid using multiple operations for performance and bandwidth concerns.
Anyone could help me on this?
回答1:
It is possible if we make some assumptions:
- values might be missing, but they are always in the same order
- the first value is always present
- there is a delimiter before and after the part we are looking for
-val1-([^-]+)(?:-val2-([^-]+)|)(?:-val3-([^-]+)|)-
https://regex101.com/r/yY6vF9/1
来源:https://stackoverflow.com/questions/31772440/force-parsing-optional-groups