I understand regular expressions reasonably well, but I don\'t get to make use of them often enough to be an expert. I ran across a regular expression that I am using to va
To break it down completely.
^ -- Match beginning of line
(?=.*\d) -- The following string contains a number
(?=.*[a-z]) -- The following string contains a lowercase letter
(?=.*[A-Z]) -- The following string contains an uppercase letter
.{6,} -- Match at least 6, as many as desired of any character
$ -- Match end of line
Under normal circumstances, a piece of a regular expression matches a piece of the input string, and "consumes" that piece of the string. The next piece of the expression matches the next piece of the string, and so on.
Lookahead assertions don't consume any of the string, so your three lookahead assertions:
(?=.*\d)
(?=.*[a-z])
(?=.*[A-Z])
each mean "This pattern (anything followed by a digit, a lowercase letter, an uppercase letter, respectively) must appear somewhere in the string", but they don't move the current match position forwards, so the remainder of the expression:
.{6,}
(which means "six or more characters") must still match the whole of the input string.
I went and checked to see how this would match if using Perl:
perl -Mre=debug -E'q[ abc 345 DEF ]=~/^(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).{6,}$/'
Compiling REx "^(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).{6,}$"
synthetic stclass "ANYOF[\0-\11\13-\377{unicode_all}]".
Final program:
1: BOL (2)
2: IFMATCH[0] (9)
4: STAR (6)
5: REG_ANY (0)
6: DIGIT (7)
7: SUCCEED (0)
8: TAIL (9)
9: IFMATCH[0] (26)
11: STAR (13)
12: REG_ANY (0)
13: ANYOF[a-z] (24)
24: SUCCEED (0)
25: TAIL (26)
26: IFMATCH[0] (43)
28: STAR (30)
29: REG_ANY (0)
30: ANYOF[A-Z] (41)
41: SUCCEED (0)
42: TAIL (43)
43: CURLY {6,32767} (46)
45: REG_ANY (0)
46: EOL (47)
47: END (0)
floating ""$ at 6..2147483647 (checking floating) stclass ANYOF[\0-\11\13-\377{unicode_all}] anchored(BOL) minlen 6
Guessing start of match in sv for REx "^(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).{6,}$" against " abc 345 DEF "
Found floating substr ""$ at offset 16...
start_shift: 6 check_at: 16 s: 0 endpos: 11
Does not contradict STCLASS...
Guessed: match at offset 0
Matching REx "^(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).{6,}$" against " abc 345 DEF "
0 <> < abc 345> | 1:BOL(2)
0 <> < abc 345> | 2:IFMATCH[0](9)
0 <> < abc 345> | 4: STAR(6)
REG_ANY can match 16 times out of 2147483647...
16 <c 345 DEF > <> | 6: DIGIT(7) # failed...
15 <c 345 DEF > < > | 6: DIGIT(7) # failed...
14 <c 345 DEF> < > | 6: DIGIT(7) # failed...
13 <c 345 DE> <F > | 6: DIGIT(7) # failed...
12 <c 345 D> <EF > | 6: DIGIT(7) # failed...
11 <c 345 > <DEF > | 6: DIGIT(7) # failed...
10 <c 345> < DEF > | 6: DIGIT(7) # failed...
9 <c 34> <5 DEF > | 6: DIGIT(7)
10 <c 345> < DEF > | 7: SUCCEED(0)
subpattern success...
0 <> < abc 345> | 9:IFMATCH[0](26)
0 <> < abc 345> | 11: STAR(13)
REG_ANY can match 16 times out of 2147483647...
16 <c 345 DEF > <> | 13: ANYOF[a-z](24) # failed...
15 <c 345 DEF > < > | 13: ANYOF[a-z](24) # failed...
14 <c 345 DEF> < > | 13: ANYOF[a-z](24) # failed...
13 <c 345 DE> <F > | 13: ANYOF[a-z](24) # failed...
12 <c 345 D> <EF > | 13: ANYOF[a-z](24) # failed...
11 <c 345 > <DEF > | 13: ANYOF[a-z](24) # failed...
10 <c 345> < DEF > | 13: ANYOF[a-z](24) # failed...
9 <c 34> <5 DEF > | 13: ANYOF[a-z](24) # failed...
8 <bc 3> <45 DEF > | 13: ANYOF[a-z](24) # failed...
7 <abc > <345 DEF > | 13: ANYOF[a-z](24) # failed...
6 < abc > < 345 DEF > | 13: ANYOF[a-z](24) # failed...
5 < abc> < 345 DEF > | 13: ANYOF[a-z](24) # failed...
4 < ab> <c 345 DEF> | 13: ANYOF[a-z](24)
5 < abc> < 345 DEF > | 24: SUCCEED(0)
subpattern success...
0 <> < abc 345> | 26:IFMATCH[0](43)
0 <> < abc 345> | 28: STAR(30)
REG_ANY can match 16 times out of 2147483647...
16 <c 345 DEF > <> | 30: ANYOF[A-Z](41) # failed...
15 <c 345 DEF > < > | 30: ANYOF[A-Z](41) # failed...
14 <c 345 DEF> < > | 30: ANYOF[A-Z](41) # failed...
13 <c 345 DE> <F > | 30: ANYOF[A-Z](41)
14 <c 345 DEF> < > | 41: SUCCEED(0)
subpattern success...
0 <> < abc 345> | 43:CURLY {6,32767}(46)
REG_ANY can match 16 times out of 2147483647...
16 <c 345 DEF > <> | 46: EOL(47)
16 <c 345 DEF > <> | 47: END(0)
Match successful!
Freeing REx: "^(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).{6,}$"
I slightly modified the output
The look-ahead assertions are used to ensure that there are at least one digit, one lowercase and one uppercase letter in the string.
The lookahead group doesn't consume the input. This way, the same characters are actually being matched by the different lookahead groups.
You can think of it this way: search for anything (.*
) until you find a digit (\d
). If you do, go back to the beginning of this group (the concept of lookahead). Now look for anything (.*
) until you find a lower case letter. Repeat for upper case letter. Now, match any 6 or more characters.