I am reading through http://docs.python.org/2/library/re.html. According to this the \"r\" in pythons re.compile(r\' pattern flags\') refers the raw string
No. Not everything in regex syntax needs to be preceded by \
, so .
, *
, +
, etc still have special meaning in a pattern
The r''
is often used as a convenience for regex that do need a lot of \
as it prevents the clutter of doubling up the \
As @PauloBu
stated, the r
string prefix is not specifically related to regex's, but to strings generally in Python.
Normal strings use the backslash character as an escape character for special characters (like newlines):
>>> print('this is \n a test')
this is
a test
The r
prefix tells the interpreter not to do this:
>>> print(r'this is \n a test')
this is \n a test
>>>
This is important in regular expressions, as you need the backslash to make it to the re
module intact - in particular, \b
matches empty string specifically at the start and end of a word. re
expects the string \b
, however normal string interpretation '\b'
is converted to the ASCII backspace character, so you need to either explicitly escape the backslash ('\\b'
), or tell python it is a raw string (r'\b'
).
>>> import re
>>> re.findall('\b', 'test') # the backslash gets consumed by the python string interpreter
[]
>>> re.findall('\\b', 'test') # backslash is explicitly escaped and is passed through to re module
['', '']
>>> re.findall(r'\b', 'test') # often this syntax is easier
['', '']
No, as the documentation pasted in explains the r
prefix to a string indicates that the string is a raw string.
Because of the collisions between Python escaping of characters and regex escaping, both of which use the back-slash \
character, raw strings provide a way to indicate to python that you want an unescaped string.
Examine the following:
>>> "\n"
'\n'
>>> r"\n"
'\\n'
>>> print "\n"
>>> print r"\n"
\n
Prefixing with an r
merely indicates to the string that backslashes \
should be treated literally and not as escape characters for python.
This is helpful, when for example you are searching on a word boundry. The regex for this is \b
, however to capture this in a Python string, I'd need to use "\\b"
as the pattern. Instead, I can use the raw string: r"\b"
to pattern match on.
This becomes especially handy when trying to find a literal backslash in regex. To match a backslash in regex I need to use the pattern \\
, to escape this in python means I need to escape each slash and the pattern becomes "\\\\"
, or the much simpler r"\\"
.
As you can guess in longer and more complex regexes, the extra slashes can get confusing, so raw strings are generally considered the way to go.