问题
Besides strings of any number of a's and b's like aa.. or bb.. ,Would regular expression (a*+b*) contain a string like
ab
or any string ending with b ?
Is (a*+b*) same as (a* b*) ?
I am a little bit confuse about the strings generated by regular expression (a*+b*) and would really appreciate if someone can help.
回答1:
Unless you're working with a regex language which explicitly classifies the *+
as a special token which either has a special meaning, or is reserved for future extension (and produces defined behavior now, or a syntax error), the natural parse of a*+
is that it means (a*)+
: the postfix +
is applied to the expression a*
.
If that interpretation applies, next we can observe that (a*)+
is equivalent to just a*
. Therefore a*+b*
is the same as a*b*
.
Firstly, by definition R+
means RR*
. Match one R
and then zero or more of them. Therefore, we can rewrite (a*)+
as (a*)(a*)*
.
Secondly, *
is idempotent, so (a*)*
is is just (a*)
. If we match "zero or more a
", zero or more times, nothing changes; the net effect is zero or more a
. Proof: R*
denotes this infinite expansion: (|R|RR|RRR|RRRR|RRRRR|...)
: match nothing, or match one R
, or match two R
's, ... Therefore, (a*)*
dentes this expansion: (|a*|a*a*|a*a*a*|...)
. These inner a*
-s in turn denote individual second-level expansions: (|(|a|aa|aaa|...|)|(|a|aa|aaa|...)(a|a|aaa|...))|...)
. By the associative property of the branch |
, we can flatten a structure like (a|(b|c))
into (a|b|c)
, and when we do this to the expansion, we note that there are numerous identical terms—the empty regex ()
, the single a
, the double aa
and so on. These all reduce to a single copy, because (|||)
is equivalent to ()
and (a|a|a|a|...)
is equivalent to just (a)
and so on. That is to say, when we sort the terms by increasing length, and squash multiple identical terms to just one copy, we end up with (|a|aa|aaa|aaaa|...)
, which is recognizable as the expansion of just a*
. Thus (a*)*
is a*
.
Lastly, (a*)(a*)
just means a*
. Proof: Similarly to the previous, we expand into branches: (|a|aa|aaa|...)(|a|aa|aaa|...)
. Next we note that the catenation of branch expressions is equivalent to a Cartesian Product set of the terms. That is to say (a|b|c|..)(i|j|k|...)
means, precisely: (ai|aj|ik|...|bi|bj|bk|...|ci|cj|ck|...|...)
. When we apply this product to (|a|aa|aaa|...)(|a|aa|aaa|...)
we get a proliferation of terms, which, when arranged in increasing length and subject to de-duplication, reduce to (|a|aa|aaa|aaaa|...)
, and that is just a*
.
回答2:
I think it helps here to look at a formal definition of regular expressions, i.e. to look for each regular expression e which language L(e) does it produce.
So let's start simple:
(1) What about the regexp a (only the letter)? Its language is
L(a) := {a},
just the single word/character "a".
(2) For the regexp e1 + e2, where e1 and e2 are regexps themselves,
L(e1 + e2) := L(e1) U L(e2).
So e.g. if a and b are characters, L(a+b) = {a, b}.
(3) For the regexp e1 e2 (concatenation), where e1 and e2 are regexps themselves,
L(e1 e2) := all words w such that
we can write w = w_1w_2 with w_1 in L(e1) and w_2 in L(e2)".
(4) What about a regular expression *e**, where e might be a regular expression itself? Intuitively, a word is in L(e*) if it has the form w_1 w_2w_3w_4...w_n, with w_i in L(e) for each i. So
L(e*) := all words w such that we can write
w = w_1 w_2 .. w_n
for a n >= 0 with all w_i in L(e) (for i = 1, 2, ..., n)
So, what about L((a* + b*))?
L((a* + b*))
(according to rule 2)
= L(a*) U L(b*)
(according to rule 4/1)
= {eps, a, aa, aaa, aaaa, ....} U {eps, b, bb, bbb, bbbb}
= all strings that have either only a's OR only b's in it
(including eps, the so-called empty word)
Similarly for (a* b*):
L((a* b*))
(according to rule 3)
= all words w = w_1 w_2 with w_1 in L(a*) and w_2 in L(b*)
= {eps eps, eps b, a eps, ab, aa eps, aab, ...}
= {eps, b, a, ab, aa, aab, aabb, ... }
= all strings that first have zero or more a's, then zero or more b's.
For the beginning I think it helps to "deconstruct" the regular expression, as we did above - since regular expressions can also be seen as trees, just like the more known arithmetic expressions, for example:
+
/ \
* *
| |
a b
来源:https://stackoverflow.com/questions/34240738/what-are-the-type-of-strings-generated-by-ab