What's the difference between \\b and \\>, \\< in regex?

匆匆过客 提交于 2019-12-06 07:35:52

Summary

\b    word boundary
\<    word boundary; specifically, word boundary followed by a word; ie, start of word
\>    word boundary; specifically, word followed by word boundary; ie, end of word

If you have a word like "bob" then the \b word boundary pattern will return two zero length matches that are equivalent to the start and the end of the word. This is useful because lets you pick out words in strings. So the string "foo bar" matched against \b has four empty matches for the start-end-start-end of the two words.

Building on that, you can then see that \< will give you the positions of just the start of the words (2 matches for the start of foo and the start of bar) and \> the ends of the words (two matches for the end of foo and the end of bar).

So you can equate \b to \< like this:

  \< 
is equivalent to
  start-of-word 
is equivalent to
  word-boundary-followed-by-word 
is equivalent to
  \b(?=\w)

I think your "Mastering Regular Expression" book is then being a bit fuzzy, and is describing \< and \> as word boundaries, when it should be more precise and distinguish them as "word boundary (specifically for start of word)" and "word boundary (specifically for end of word)" respectively.

Python example:

>>> re.compile(r'\b').findall('foo bar')
['', '', '', '']
>>> re.compile(r'\b(?=\w)').findall('foo bar')
['', '']

Note that python doesn't support \< and \>. And here's an example of why word boundaries are useful. We can pick out the BAR that is an entire word, rather than the one wrapped up inside foo:

>>> re.compile(r'\bBAR\b').findall('foBARo BAR')
['BAR']
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!