regex to grab text if code exists

假如想象 提交于 2020-01-24 12:02:40

问题


I m trying to build a regex to add the code value if codename exists

say an example:

{(en56), (sc45), (da77), (cd29)}
{(en56), (sc45), (cd29)}

i will write a regex like {[(]en(?<en>\d{2}).*[(]sc(?<sc>\d{2}).*[(]da(?<da>\d{2}).*[(]cd(?<cd>\d{2}).*

i will grab the first line anyway as it matches and the results of marks will be extracted. how to keep da as optional if the input comes without it.

when i tried with ? , it basically eliminates the values from first result {[(]en(?<en>\d{2}).*[(]sc(?<sc>\d{2}).*([(]da(?<da>\d{2}))?.*[(]cd(?<cd>\d{2}).*


回答1:


New Answer

I just noticed you're using Qt Regular Expressions.
Since, Qt uses PCRE engine, you can take advantage of conditionals
to not only optionally find the items, but to also find them Out-Of-Order.
Whether they are or aren't in order, it still finds them.

So, all the bases are covered. And you get a look at some advanced
regular expression technique.

The idea is to find 1-4 items. This is done with a group construct and
a range quantifier (?: ... | ... | ... | ...){1,4}

The upper range 4 because that is the number of items in the group.

Finally, each item is guarded with a conditional to insure that the
item is not matched again. This is needed to insure the upper limit 4
refers to unique items, while the range makes each one optional.

The side benefit of this is that each item can match out-of-order
which means the item order in which it appears in the source text
is irrelevant.

Good luck! And hope you get a chance to try this out ..

Formatted and tested:

 #  {(?:.*?(?:(?(<en>)(?!))[(]en(?<en>\d{2})|(?(<sc>)(?!))[(]sc(?<sc>\d{2})|(?(<da>)(?!))[(]da(?<da>\d{2})|(?(<cd>)(?!))[(]cd(?<cd>\d{2}))){1,4}

 # Match 1-4 'Out-Of-Order' unique items
 # --------------------------------------------
 {
 (?:                  # Cluster start - loop to find out of order items
      .*? 
      (?:
           (?(<en>)             # Condition, not matched 'en' before
                (?!)
           )
           [(] en
           (?<en> \d{2} )       # (1)
        |                     # or,
           (?(<sc>)             # Condition, not matched 'sc' before
                (?!)
           )
           [(] sc
           (?<sc> \d{2} )       # (2)
        |                     # or,
           (?(<da>)             # Condition, not matched 'da' before
                (?!)
           )
           [(] da
           (?<da> \d{2} )       # (3)
        |                     # or,
           (?(<cd>)             # Condition, not matched 'cd' before
                (?!)
           )
           [(] cd
           (?<cd> \d{2} )       # (4)
      )
 ){1,4}               # Cluster end - find  1 to 4 unique items

Test Input

{(sc45), (en56), (da77), (cd29)}
{(da77), (cd29)}
{(en56), (sc45), (cd29)}
{(da77), (cd29) (en56), (sc45)}
{(sc45)}
{(en56), (cd29), (sc45)}

Output

 **  Grp 0      -  ( pos 0 , len 30 ) 
{(sc45), (en56), (da77), (cd29  
 **  Grp 1 [en] -  ( pos 12 , len 2 ) 
56  
 **  Grp 2 [sc] -  ( pos 4 , len 2 ) 
45  
 **  Grp 3 [da] -  ( pos 20 , len 2 ) 
77  
 **  Grp 4 [cd] -  ( pos 28 , len 2 ) 
29
------------  
 **  Grp 0      -  ( pos 34 , len 14 ) 
{(da77), (cd29  
 **  Grp 1 [en] -  NULL 
 **  Grp 2 [sc] -  NULL 
 **  Grp 3 [da] -  ( pos 38 , len 2 ) 
77  
 **  Grp 4 [cd] -  ( pos 46 , len 2 ) 
29  
------------  
 **  Grp 0      -  ( pos 52 , len 22 ) 
{(en56), (sc45), (cd29  
 **  Grp 1 [en] -  ( pos 56 , len 2 ) 
56  
 **  Grp 2 [sc] -  ( pos 64 , len 2 ) 
45  
 **  Grp 3 [da] -  NULL 
 **  Grp 4 [cd] -  ( pos 72 , len 2 ) 
29  
------------  
 **  Grp 0      -  ( pos 78 , len 29 ) 
{(da77), (cd29) (en56), (sc45  
 **  Grp 1 [en] -  ( pos 97 , len 2 ) 
56  
 **  Grp 2 [sc] -  ( pos 105 , len 2 ) 
45  
 **  Grp 3 [da] -  ( pos 82 , len 2 ) 
77  
 **  Grp 4 [cd] -  ( pos 90 , len 2 ) 
29  
------------  
 **  Grp 0      -  ( pos 111 , len 6 ) 
{(sc45  
 **  Grp 1 [en] -  NULL 
 **  Grp 2 [sc] -  ( pos 115 , len 2 ) 
45  
 **  Grp 3 [da] -  NULL 
 **  Grp 4 [cd] -  NULL 
------------  
 **  Grp 0      -  ( pos 121 , len 22 ) 
{(en56), (cd29), (sc45  
 **  Grp 1 [en] -  ( pos 125 , len 2 ) 
56  
 **  Grp 2 [sc] -  ( pos 141 , len 2 ) 
45  
 **  Grp 3 [da] -  NULL 
 **  Grp 4 [cd] -  ( pos 133 , len 2 ) 
29  

Benchmark

Regex1:   {(?:.*?(?:(?(<en>)(?!))[(]en(?<en>\d{2})|(?(<sc>)(?!))[(]sc(?<sc>\d{2})|(?(<da>)(?!))[(]da(?<da>\d{2})|(?(<cd>)(?!))[(]cd(?<cd>\d{2}))){1,4}
Options:  < none >
Completed iterations:   50  /  50     ( x 1000 )
Matches found per iteration:   6
Elapsed Time:    3.41 s,   3411.71 ms,   3411714 µs



回答2:


You can make that part of the regex optional by enclosing it into a non-capturing group with a ? quantifier that matches one or zero occurrences of the subpattern it quantifies:

{[(]en(?<en>\d{2}).*[(]sc(?<sc>\d{2})(?:.*[(]da(?<da>\d{2}))?.*[(]cd(?<cd>\d{2}).*
                                     ^^^                   ^^

See regex demo

Using this technique you can make more parts of your regex optional if necessary.



来源:https://stackoverflow.com/questions/35734288/regex-to-grab-text-if-code-exists

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!