Match list of incrementing integers using regex

前端 未结 1 760
忘掉有多难
忘掉有多难 2021-02-04 10:02

Is it possible to match a list of comma-separated decimal integers, where the integers in the list always increment by one?

These should match:

0,1,2,3
8         


        
相关标签:
1条回答
  • 2021-02-04 10:41

    Yes, this is possible when using a regex engine that supports backreferences and conditions.

    First, the list of consecutive numbers can be decomposed into a list where each pair of numbers are consecutive:

    (?=(?&cons))\d+
    (?:,(?=(?&cons))\d+)*
    ,\d+
    

    Here (?=(?&cons)) is a placeholder for a predicate that ensures that two numbers are consecutive. This predicate might look as follows:

    (?<cons>\b(?:
        (?<x>\d*)
        (?:(?<a0>0)|(?<a1>1)|(?<a2>2)|(?<a3>3)|(?<a4>4)
                   |(?<a5>5)|(?<a6>6)|(?<a7>7)|(?<a8>8))
        (?:9(?= 9*,\g{x}\d (?<y>\g{y}?+ 0)))*
        ,\g{x}
        (?(a0)1)(?(a1)2)(?(a2)3)(?(a3)4)(?(a4)5)
        (?(a5)6)(?(a6)7)(?(a7)8)(?(a8)9)
        (?(y)\g{y})
        # handle the 999 => 1000 case separately
      | (?:9(?= 9*,1 (?<z>\g{z}?+ 0)))+
        ,1\g{z}
    )\b)
    

    For a brief explanation, the second case handling 999,1000 type pairs is easier to understand -- there is a very detailed description of how it works in this answer concerned with matching a^n b^n. The connection between the two is that in this case we need to match 9^n ,1 0^n.

    The first case is slightly more complicated. The largest part of it handles the simple case of incrementing a decimal digit, which is relatively verbose due to the number of said digits:

    (?:(?<a0>0)|(?<a1>1)|(?<a2>2)|(?<a3>3)|(?<a4>4)
               |(?<a5>5)|(?<a6>6)|(?<a7>7)|(?<a8>8))
    
    (?(a0)1)(?(a1)2)(?(a2)3)(?(a3)4)(?(a4)5)
    (?(a5)6)(?(a6)7)(?(a7)8)(?(a8)9)
    

    The first block will capture whether the digit is N into group aN and the second block will then uses conditionals to check which of these groups was used. If group aN is non-empty, the next digit should be N+1.

    The remainder of the first case handles cases like 1999,2000. This again falls into the pattern N 9^n, N+1 0^n, so this is a combination of the method for matching a^n b^n and incrementing a decimal digit. The simple case of 1,2 is handled as the limiting case where n=0.

    Complete regex: https://regex101.com/r/zG4zV0/1


    Alternatively the (?&cons) predicate can be implemented slightly more directly if recursive subpattern references are supported:

    (?<cons>\b(?:
        (?<x>\d*)
        (?:(?<a0>0)|(?<a1>1)|(?<a2>2)|(?<a3>3)|(?<a4>4)
                   |(?<a5>5)|(?<a6>6)|(?<a7>7)|(?<a8>8))
        (?<y>
            ,\g{x}
            (?(a0)1)(?(a1)2)(?(a2)3)(?(a3)4)(?(a4)5)
            (?(a5)6)(?(a6)7)(?(a7)8)(?(a8)9)
          | 9 (?&y) 0
        )
        # handle the 999 => 1000 case separately
      | (?<z> 9,10 | 9(?&z)0 )
    )\b)
    

    In this case the two grammars 9^n ,1 0^n, n>=1 and prefix N 9^n , prefix N+1 0^n, n>=0 are pretty much just written out explicitly.

    Complete alternative regex: https://regex101.com/r/zG4zV0/3

    0 讨论(0)
提交回复
热议问题