I\'m trying to write a regular expression which specifies that text should start with a letter, every character should be a letter, number or underscore, there should not be
^[A-Za-z][A-Za-z0-9]*(?:_[A-Za-z0-9]+)*$
seeing how the rules are fairly complicated, I'd suggest the following:
/^[a-z](\w*)[a-z0-9]$/i
match the whole string and capture intermediate characters. Then either with the string functions or the following regex:
/__/
check if the captured part has two underscores in a row. For example in Python it would look like this:
>>> import re
>>> def valid(s):
match = re.match(r'^[a-z](\w*)[a-z0-9]$', s, re.I)
if match is not None:
return match.group(1).count('__') == 0
return False
I'll take a stab at it:
/^[a-z](?:_?[a-z0-9]+)*$/i
Explained:
/
^ # match beginning of string
[a-z] # match a letter for the first char
(?: # start non-capture group
_? # match 0 or 1 '_'
[a-z0-9]+ # match a letter or number, 1 or more times
)* # end non-capture group, match whole group 0 or more times
$ # match end of string
/i # case insensitive flag
The non-capture group takes care of a) not allowing two _
's (it forces at least one letter or number per group) and b) only allowing the last char to be a letter or number.
Some test strings:
"a": match
"_": fail
"zz": match
"a0": match
"A_": fail
"a0_b": match
"a__b": fail
"a_1_c": match
Here's a solution using a negative lookahead (not supported in all regex engines):
^[a-zA-Z](((?!__)[a-zA-Z0-9_])*[a-zA-Z0-9])?$
Test that it works as expected:
import re
tests = [
('a', True),
('_', False),
('zz', True),
('a0', True),
('A_', False),
('a0_b', True),
('a__b', False),
('a_1_c', True),
]
regex = '^[a-zA-Z](((?!__)[a-zA-Z0-9_])*[a-zA-Z0-9])?$'
for test in tests:
is_match = re.match(regex, test[0]) is not None
if is_match != test[1]:
print "fail: " + test[0]