R split string at last whitespace chars using tidyr::separate

前端 未结 2 1858
余生分开走
余生分开走 2021-02-14 08:00

Suppose I have a dataframe like this:

df<-data.frame(a=c(\"AA\",\"BB\"),b=c(\"short string\",\"this is the longer string\"))

I would like to

相关标签:
2条回答
  • 2021-02-14 08:46

    You may turn the [^ ]*$ part of your regex into a (?=[^ ]*$) non-consuming pattern, a positive lookahead (that will not consume the non-whitespace chars at the end of the string, i.e. they won't be put into the match value and thus will stay there in the output):

    df%>%
      separate(b,c("partA","partB"),sep=" (?=[^ ]*$)")
    

    Or, a bit more universal since it matches any whitespace chars:

    df %>%
      separate(b,c("partA","partB"),sep="\\s+(?=\\S*$)")
    

    See the regex demo and its graph below:

    Output:

       a              partA  partB
    1 AA              short string
    2 BB this is the longer string
    
    0 讨论(0)
  • 2021-02-14 08:55

    We can use extract from tidyr by using the capture groups ((...)). We match zero or more characters (.*) and place it within the parentheses ((.*)), followed by zero or more space (\\s+), followed by the next capture group which includes only characters that are not a space ([^ ]) until the end ($) of the string.

    library(tidyr)
    extract(df, b, into = c('partA', 'partB'), '(.*)\\s+([^ ]+)$')
    #   a              partA  partB
    #1 AA              short string
    #2 BB this is the longer string
    
    0 讨论(0)
提交回复
热议问题