Now, I'm quite confused.
I was found this in regex cheat sheet
\b word boundary \< start of word \> end of word
But in "Mastering Regular Expression" book, it told me that
\< word boundary \> word boundary
What's the difference between \b
and \>
\<
in regex?
Summary
\b word boundary \< word boundary; specifically, word boundary followed by a word; ie, start of word \> word boundary; specifically, word followed by word boundary; ie, end of word
If you have a word like "bob" then the \b
word boundary pattern will return two zero length matches that are equivalent to the start and the end of the word. This is useful because lets you pick out words in strings. So the string "foo bar" matched against \b
has four empty matches for the start-end-start-end of the two words.
Building on that, you can then see that \<
will give you the positions of just the start of the words (2 matches for the start of foo and the start of bar) and \>
the ends of the words (two matches for the end of foo and the end of bar).
So you can equate \b
to \<
like this:
\<
is equivalent to
start-of-word
is equivalent to
word-boundary-followed-by-word
is equivalent to
\b(?=\w)
I think your "Mastering Regular Expression" book is then being a bit fuzzy, and is describing \<
and \>
as word boundaries, when it should be more precise and distinguish them as "word boundary (specifically for start of word)" and "word boundary (specifically for end of word)" respectively.
Python example:
>>> re.compile(r'\b').findall('foo bar')
['', '', '', '']
>>> re.compile(r'\b(?=\w)').findall('foo bar')
['', '']
Note that python doesn't support \<
and \>
. And here's an example of why word boundaries are useful. We can pick out the BAR that is an entire word, rather than the one wrapped up inside foo:
>>> re.compile(r'\bBAR\b').findall('foBARo BAR')
['BAR']
来源:https://stackoverflow.com/questions/27723018/whats-the-difference-between-b-and-in-regex