What does the underscore mean in the following regex?
[a-zA-Z0-9_]
The _
seems to make no difference so I don\'t understand the purpose
Regular expressions are documented in perlre. That's the place to check whenever you have a question about regular expressions. The Regular-Expressions.info site is very helpful too.
To get you started, the thing you are looking at is called a "character class". Any of the characters inside a character class can match.
You can make a range of characters with the -
, so a-z
is any of the lowercase letters in that range. A-Z
are the uppercase letters and 0-9
are the digits. The _
is a literal underscore. Taken together those are the legal characters for a Perl identifier (variable names and so on). That's the \w
character class in the ASCII sense (and not the expanded Unicode sense).
People often use that to match a Perl variable name but there's a rule that people forget. The first character of a user-defined name has to be a letter or underscore (not a digit). That means that you should use a different character class for the initial letter:
[A-Za-z_][A-Za-z0-9_]*