R regex - extract words beginning with @ symbol

后端 未结 3 1816
太阳男子
太阳男子 2021-01-18 04:09

I\'m trying to extract twitter handles from tweets using R\'s stringr package. For example, suppose I want to get all words in a vector that begin with \"A\". I can do this

3条回答
  •  说谎
    说谎 (楼主)
    2021-01-18 04:32

    The answer above should suffice. This will remove the @ symbol in case you are trying to get the users' names only.

    str_extract_all(c("@tweeter tweet", "h@is", "tweet @tweeter2"), "(?<=\\B\\@)[^\\s]+")
    [[1]]
    [1] "tweeter"
    
    [[2]]
    character(0)
    
    [[3]]
    [1] "tweeter2"
    

    While I am no expert with regex, it seems like the issue may be that the @ symbol does not correspond to a word character, and thus matching the empty string at the beginning of a word (\\b) does not work because there is no empty string when @ is preceding the word.

    Here are two great regex resources in case you hadn't seen them:

    • stat545
    • Stringr's Regex page, also available as a vignette:

      vignette("regular-expressions", package = "stringr")

提交回复
热议问题