L = {1i - 1j = 1i-j: i-j >= 0, i,j>=0}
I\'m confused about how to construct a grammar that tracks subtraction of a stri
Here's a tip: always try to think of context-free languages in terms of parenthetic balancing.
Consider the following two languages:
ab ()
aabb (())
aaabbb ((()))
aaaabbbb (((())))
... ...
Most of us "see" the first language as a repetition (some a's followed by the same number of b's) and the second one as a nesting. But the context-free grammars for these two languages are identical:
S: ab S: ()
| a S b | ( S )
Because the CFG is being parsed with a stack automaton, the nesting is natural. The first language is actually some number of a's followed by the same number of b's, reversed. Of course, some number of b's and some number of b's reversed look identical... until you draw the derivation tree.
So consider the unary addition language: { 1i+j = 1i + 1j }
.
Clearly that's the same as { 1j+i = 1i + 1j }
(switching the order of addition makes no difference). Or, writing out the addition as a simple concatenation: { 1j 1i = 1i + 1j }
. Now, we can just group the symmetry: (here the parentheses are metasymbols to show grouping, not part of the language): { 1j ( ( 1i = 1i ) + ) 1j }
.
Which leads to
L1 → =
L1 → 1 L1 1
L2 → L1 +
L2 → 1 L2 1
In the CFG, the addition has been obscured a bit, but it's clear what is happening: we're allowing a sentence with the same number of 1s on both sides of an =
, with a single +
being inserted anywhere on the right-hand side. (With a very simple change to the grammar, we could allow multiple +
signs, making out grammar accept addition equations with more than one addend.)
Whenever I see these kinds of problems, I try to think about the smallest strings in the language and then rules to get bigger strings. The smallest string in this language may be - =
if we allow i - j, i, j = 0
; I might dispute whether this can be seriously thought of as a proper encoding of 0 - 0 = 0
in unary, but I digress. The following derivation works equally well if you require i - j, i, j > 0
and consider 11 - 1 = 1
to be the shortest string in your language. How do we get longer strings in the language?
First, note that we can add 1
to the beginning and end of the string and get another string in the language: - =
is a string, and so are 1 - = 1
, 11 - = 11
, ..., 1^n - = 1^n
. This suggests that we will have rules like S -> 1S1
and S -> T
.
Second, note that the difference 1^i - 1^j
remains the same if we increase i
and j
by one each. So, if 1^i - 1^j = 1^(i+j)
is in our language, then so is 1^(i+1) - 1^(j+1) = 1^(i+j)
. Clearly, we need a rule that allows us to put some 1
s in between the -
and the =
, since we don't have one already. This observation says that when we do that, we have to put one before the -
. We might guess a good rule is T -> 1T1 | -
.
Considering our grammar so far, we have:
S -> 1T1 | T
T -> 1T1 | -
This gives us 1^s 1^t - 1^t 1^s
. This is actually pretty close, but we are missing =
. We see it needs to come after T
, and in fact, we can just add it to the production S -> T
to get
S -> 1S1 | T =
S -> 1T1 | -
This gives us 1^s 1^t - 1^t = 1^s
. Since i = s + t
, j = t
and i - j = s + t - t = s
as we have on the right of =
, it looks right in that the grammar should only generate strings in the language. But does it generate all strings in the language? We can proceed by mathematical induction on the number of 1
s in the strings. Before we begin, note that the number of 1
s must always be even, since the equation never holds if just one, or all three, of the strings of 1
s have odd length, and those are the only cases where the total number of 1
s could be odd.
Base Case: there is only one string in L
with no 1
s, namely - =
, and it is derived by S -> T = -> - =
.
Induction hypothesis: assume that all strings in L
with no more than k
instances of 1
are generated by the grammar.
Induction step: we need to show any string with k + 2
(remember, the number of 1
s in any string in L
must be even by our earlier observation) instances of 1
can be generated by the grammar. our string can be written 1^a - 1^b = 1^c
where a + b + c = k + 2
, a - b = c
c <= a
and b <= a
must be true. if a = b
and c = 0
, then there is a string of length k
with a' = a - 1
and b' = b - 1
also in L
; by the induction hypothesis, it is generated by the grammar, and by inspection we see that we can insert another application of T -> 1T1
in the derivation before concluding it with T -> -
in order to get our string. If, on the other hand, a > b
, then there is a shorter string in L
with a' = a - 1
and c' = c - 1
. By the induction hypothesis, this is generated by our grammar, and by inspection we see we can insert an extra application of S -> 1S1
into the derivation before S -> T =
in order to get our string. In both cases the grammar generates the string of length k + 2
, so all strings in L
are generated by the grammar.