Force parsing optional groups

别来无恙 提交于 2019-12-23 16:54:53

问题


I'm trying to make a regex string that extracts data from report files. The tricky part is that I need this single regex string to match multiple report file content formats. I want the regex to always match even if some optional groups are not found.

Take the following report files content (Note: #2 is missing the "val2" part.):

  • File #1: "-val1-test-val2-result-val3-done-"
    • Expected Result:
      • Val1 Group: test
      • Val2 Group: result
      • Val3 Group: done
  • File #2: "-val1-test-val3-done-"
    • Expected Result:
      • Val1 Group: test
      • Val2 Group: (empty)
      • Val3 Group: done

I tried the following regex strings :

Regex #1(Normal): "-val1-(?<val1>.+?)-val2-(?<val2>.+?)-val3-(?<val3>.+?)-"

Problem: File #1 works fine but on file #2, the regex is not matching so I don't have any group values.

Regex #2(Non greedy)): "-val1-(?<val1>.+?)(-val2-(?<val2>.+?))?-val3-(?<val3>.+?)-"
Regex #3(Boolean OR): "-val1-(?<val1>.+?)(-val2-(?<val2>.+?)|(.*?))-val3-(?<val3>.+?)-"
Regex #4(Conditionnal): "-val1-(?<val1>.+?)(?(-val2-(?<val2>.+?))|(.+?))-val3-(?<val3>.+?)-"
Regex #5(Conditionnal): "-val1-(?<val1>.+?)(?(-val2-(?<val2>.+?))(-val2-(?<val2>.+?)))-val3-(?<val3>.+?)-"
Regex #6(Conditionnal): "-val1-(?<val1>.+?)(?(-val2-(?<val2>.+?))(-val2-(?<val2>.+?))|(.+?))-val3-(?<val3>.+?)-"

Problem: File #2 works as expected but the val2 group of file #1 is always empty.

Conclusion: The behavior seems to be that even if an optional group is present, the regex will prioritize an empty group value over the present value. Is there a way to force getting the optional groups' value when they are present and only return (empty) when they're not?

Note: I'm using the latest .NET framework and the code will ported to Java(Android). I'm trying to avoid using multiple operations for performance and bandwidth concerns.

Anyone could help me on this?


回答1:


It is possible if we make some assumptions:

  1. values might be missing, but they are always in the same order
  2. the first value is always present
  3. there is a delimiter before and after the part we are looking for

 

-val1-([^-]+)(?:-val2-([^-]+)|)(?:-val3-([^-]+)|)-

https://regex101.com/r/yY6vF9/1



来源:https://stackoverflow.com/questions/31772440/force-parsing-optional-groups

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!