Is it possible to match a list of comma-separated decimal integers, where the integers in the list always increment by one?
These should match:
0,1,2,3
8
Yes, this is possible when using a regex engine that supports backreferences and conditions.
First, the list of consecutive numbers can be decomposed into a list where each pair of numbers are consecutive:
(?=(?&cons))\d+
(?:,(?=(?&cons))\d+)*
,\d+
Here (?=(?&cons))
is a placeholder for a predicate that ensures that two numbers are consecutive. This predicate might look as follows:
(?\b(?:
(?\d*)
(?:(?0)|(?1)|(?2)|(?3)|(?4)
|(?5)|(?6)|(?7)|(?8))
(?:9(?= 9*,\g{x}\d (?\g{y}?+ 0)))*
,\g{x}
(?(a0)1)(?(a1)2)(?(a2)3)(?(a3)4)(?(a4)5)
(?(a5)6)(?(a6)7)(?(a7)8)(?(a8)9)
(?(y)\g{y})
# handle the 999 => 1000 case separately
| (?:9(?= 9*,1 (?\g{z}?+ 0)))+
,1\g{z}
)\b)
For a brief explanation, the second case handling 999,1000
type pairs is easier to understand -- there is a very detailed description of how it works in this answer concerned with matching a^n b^n. The connection between the two is that in this case we need to match 9^n ,1 0^n
.
The first case is slightly more complicated. The largest part of it handles the simple case of incrementing a decimal digit, which is relatively verbose due to the number of said digits:
(?:(?0)|(?1)|(?2)|(?3)|(?4)
|(?5)|(?6)|(?7)|(?8))
(?(a0)1)(?(a1)2)(?(a2)3)(?(a3)4)(?(a4)5)
(?(a5)6)(?(a6)7)(?(a7)8)(?(a8)9)
The first block will capture whether the digit is N into group aN and the second block will then uses conditionals to check which of these groups was used. If group aN is non-empty, the next digit should be N+1.
The remainder of the first case handles cases like 1999,2000
. This again falls into the pattern N 9^n, N+1 0^n
, so this is a combination of the method for matching a^n b^n
and incrementing a decimal digit. The simple case of 1,2
is handled as the limiting case where n=0.
Complete regex: https://regex101.com/r/zG4zV0/1
Alternatively the (?&cons)
predicate can be implemented slightly more directly if recursive subpattern references are supported:
(?\b(?:
(?\d*)
(?:(?0)|(?1)|(?2)|(?3)|(?4)
|(?5)|(?6)|(?7)|(?8))
(?
,\g{x}
(?(a0)1)(?(a1)2)(?(a2)3)(?(a3)4)(?(a4)5)
(?(a5)6)(?(a6)7)(?(a7)8)(?(a8)9)
| 9 (?&y) 0
)
# handle the 999 => 1000 case separately
| (? 9,10 | 9(?&z)0 )
)\b)
In this case the two grammars 9^n ,1 0^n
, n>=1 and prefix N 9^n , prefix N+1 0^n
, n>=0 are pretty much just written out explicitly.
Complete alternative regex: https://regex101.com/r/zG4zV0/3