R regex - extract words beginning with @ symbol

后端 未结 3 1817
太阳男子
太阳男子 2021-01-18 04:09

I\'m trying to extract twitter handles from tweets using R\'s stringr package. For example, suppose I want to get all words in a vector that begin with \"A\". I can do this

3条回答
  •  傲寒
    傲寒 (楼主)
    2021-01-18 04:28

    It looks like you probably mean

    str_extract_all(c("h@i", "hi @hello @me", "@twitter"), "(?<=^|\\s)@[^\\s]+")
    # [[1]]
    # character(0)
    # [[2]]
    # [1] "@hello" "@me" 
    # [[3]]
    # [1] "@twitter"
    

    The \b in a regular expression is a boundary and it occurs "Between two characters in the string, where one is a word character and the other is not a word character." see here. Since an space and "@" are both non-word characters, there is no boundary before the "@".

    With this revision you match either the start of the string or values that come after spaces.

提交回复
热议问题