In R: grab all alnum characters before the first punctuation

前端 未结 1 1136
無奈伤痛
無奈伤痛 2021-01-26 21:06

I have a vector s of strings (or NAs), and would like to get a vector of same length of everything before first occurrence of punctionation (.).

相关标签:
1条回答
  • 2021-01-26 21:31

    You can remove all symbols (incl. a newline) from the first dot with the following Perl-like regex:

    s <- c("ABC1.2", "22A.2", NA)
    gsub("[.][\\s\\S]*$", "", s, perl=T)
    ## => [1] "ABC1" "22A"  NA  
    

    See IDEONE demo

    The regex matches

    • [.] - a literal dot
    • [\\s\\S]* - any symbols incl. a newline
    • $ - end of string.

    All matched strings are removed from the input with "". As the regex engine analyzes the string from left to right, the first dot is matched with \\., and the greedy * quantifier with [\\s\\S] will match all up to the end of string.

    If there are no newlines, a simpler regex will do: [.].*$:

    gsub("[.].*$", "", s)
    

    See another demo

    0 讨论(0)
提交回复
热议问题