I am self-studying regular expressions and found an interesting practice problem online that involves writing a regular expression to recognize all binary numbers divisible by 3
The problem you're encountering is that whilst your trick is (probably) valid, it doesn't map to a practical DFA (you have to track a potentially arbitrary difference between the number of even and odd ones, which would require an arbitrary number of states).
An alternative approach is to note that (working from MSB to LSB) after the i
-th character , x[i]
, your substring must either be equal to 0, 1, or 2 in modulo-3 arithmetic; call this value S[i]
. x[i+1]
must be either 0 or 1, which is equivalent to multiplying by 2 and optionally adding 1.
So if you know S[i]
and x[i+1]
, you can calculate S[i+1]
. Does that description sound familiar?
I have another way to this problem and I think this is easier to understand.
When we are dividing a number by 3 we can have three remainders: 0, 1, 2.
We can describe a number which is divisible by 3 using expression 3t
(t
is a natural number).
When we are adding 0 after a binary number whose remainder is 0, the actual decimal number will be doubled. Because each digit is moving to a higher position.
3t * 2 = 6t
, this is also divisible by 3.
When we are adding a 1 after a binary number whose remainder is 0, the actual decimal number will be doubled plus 1. Because each digit is moving to a higher position followed by a 1;
3t * 2 + 1 = 6t + 1
, the remainder is 1.
When we are adding a 1 after a binary number whose remainder is 1. The actual decimal number will be doubled plus one, and the remainder is 0;
(3t + 1)*2 + 1 = 6t + 3 = 3(2t + 1)
, this is divisible by 3.
When we are adding a 0 after a binary number whose remainder is 1. The actual decimal number will be doubled. And the remainder will be 2.
(3t + 1)*2 = 6t + 2
.
When we are adding a 0 after a binary number whose remainder is 2. The remainder will be 1.
(3t + 2)*2 = 6t + 4 = 3(2t + 1) + 1
When we are adding a 1 after a binary number whose remainder is 2. Then remainder will still be 2.
(3t + 2)*2 + 1 = 6t + 5 = 3(2t + 1) + 2.
No matter how many 1 you add to a binary number whose remainder is 2, remainder will be 2 forever.
(3(2t + 1) + 2)*2 + 1 = 3(4t + 2) + 5 = 3(4t + 3) + 2
So we can have the DFA to describe the binary number:
Note: Edge q2 -> q1
should be labelled 0.
Following what Oli Charlesworth says, you can build DFA for divisibility of base b
number by a certain divisor d
, where the states in the DFA represent the remainder of the division.
For your case (base 2 - binary number, divisor d
= 310):
Note that the DFA above accepts empty string as a "number" divisible by 3. This can easily be fixed by adding one more intermediate state in front:
Conversion to theoretical regular expression can be done with the normal process.
Conversion to practical regex in flavors that supports recursive regex can be done easily, when you have got the DFA. This is shown for the case of (base b
= 10, d
= 710) in this question from CodeGolf.SE.
Let me quote the regex in the answer by Lowjacker, written in Ruby regex flavor:
(?!$)(?>(|(?<B>4\g<A>|5\g<B>|6\g<C>|[07]\g<D>|[18]\g<E>|[29]\g<F>|3\g<G>))(|(?<C>[18]\g<A>|[29]\g<B>|3\g<C>|4\g<D>|5\g<E>|6\g<F>|[07]\g<G>))(|(?<D>5\g<A>|6\g<B>|[07]\g<C>|[18]\g<D>|[29]\g<E>|3\g<F>|4\g<G>))(|(?<E>[29]\g<A>|3\g<B>|4\g<C>|5\g<D>|6\g<E>|[07]\g<F>|[18]\g<G>))(|(?<F>6\g<A>|[07]\g<B>|[18]\g<C>|[29]\g<D>|3\g<E>|4\g<F>|5\g<G>))(|(?<G>3\g<A>|4\g<B>|5\g<C>|6\g<D>|[07]\g<E>|[18]\g<F>|[29]\g<G>)))(?<A>$|[07]\g<A>|[18]\g<B>|[29]\g<C>|3\g<D>|4\g<E>|5\g<F>|6\g<G>)
Breaking it down, you can see how it is constructed. The atomic grouping (or non-backtracking group, or a group that behaves possessively) is used to make sure only the empty string alternative is matched. This is a trick to emulate (?DEFINE)
in Perl. Then the groups A
to G
correspond to remainder of 0 to 6 when the number is divided by 7.
(?!$)
(?>
(|(?<B>4 \g<A>|5 \g<B>|6 \g<C>|[07]\g<D>|[18]\g<E>|[29]\g<F>|3 \g<G>))
(|(?<C>[18]\g<A>|[29]\g<B>|3 \g<C>|4 \g<D>|5 \g<E>|6 \g<F>|[07]\g<G>))
(|(?<D>5 \g<A>|6 \g<B>|[07]\g<C>|[18]\g<D>|[29]\g<E>|3 \g<F>|4 \g<G>))
(|(?<E>[29]\g<A>|3 \g<B>|4 \g<C>|5 \g<D>|6 \g<E>|[07]\g<F>|[18]\g<G>))
(|(?<F>6 \g<A>|[07]\g<B>|[18]\g<C>|[29]\g<D>|3 \g<E>|4 \g<F>|5 \g<G>))
(|(?<G>3 \g<A>|4 \g<B>|5 \g<C>|6 \g<D>|[07]\g<E>|[18]\g<F>|[29]\g<G>))
)
(?<A>$| [07]\g<A>|[18]\g<B>|[29]\g<C>|3 \g<D>|4 \g<E>|5 \g<F>|6 \g<G>)
Binary numbers divisible by 3 fall into 3 categories:
(ex. 11, 110, 1100,1001,10010, 1111)
(decimal: 3, 6, 12, 9, 18, 15)
(ex. 10101, 101010, 1010001, 1000101)
(decimal: 21, 42, 81, 69)
(ex. 1010111, 1110101, 1011100110001)
(decimal: 87, 117, 5937)
So a regular expression that takes into account these three rules is simply:
0*(1(00)*10*|10(00)*1(00)*(11)*0(00)*10*)*0*
How to read it:
() encapsulate
* means the previous number/group is optional
| indicates a choice of options on either side within the parentheses