Split Character String Using Only Last Delimiter in r

后端 未结 4 880
暖寄归人
暖寄归人 2021-01-16 07:38

I have a character variable that I would like to split into 2 variables based on a \"-\" delimiter, however, I would only like to split based on the last delimiter as there

相关标签:
4条回答
  • 2021-01-16 08:02

    You can try using gregexpr :

    a=c("foo - bar","hey-now-man","say-now-girl","fine-now")
    lastdelim = tail(gregexpr("-",a)[[1]],n=1)
    output1 = sapply(a,function(x) {substr(x,1,lastdelim-1)})
    output2 = sapply(a,function(x) {substr(x,lastdelim+1,nchar(x))})
    
    0 讨论(0)
  • 2021-01-16 08:06

    A solution based on stringi and data.table: reverse the string and split it into fixed items and then reverse back:

    library(stringi)
    x <- c('foo - bar', 'hey-now-man', 'say-now-girl', 'fine-now')
    
    lapply(stri_split_regex(stri_reverse(x), pattern = '[-\\s]+', n = 2), stri_reverse)
    

    If we want to make a data.frame with this:

    y <- lapply(stri_split_regex(stri_reverse(x), pattern = '[-\\s]+', n = 2), stri_reverse)
    
    y <- setNames(data.table::transpose(y)[2:1], c('output1', 'output2'))
    
    df <- as.data.frame(c(list(input = x), y))
    
    # > df
    # input output1 output2
    # 1    foo - bar     foo     bar
    # 2  hey-now-man hey-now     man
    # 3 say-now-girl say-now    girl
    # 4     fine-now    fine     now
    
    0 讨论(0)
  • 2021-01-16 08:11

    Using unglue you would do :

    # install.packages("unglue")
    library(unglue)
    df <- data.frame(input = c("foo - bar","hey-now-man","say-now-girl","fine-now"))
    unglue_unnest(df, input, "{output1}{=\\s*-\\s*}{output2=[^-]+}", remove = FALSE)
    #>          input output1 output2
    #> 1    foo - bar     foo     bar
    #> 2  hey-now-man hey-now     man
    #> 3 say-now-girl say-now    girl
    #> 4     fine-now    fine     now
    

    Created on 2019-11-06 by the reprex package (v0.3.0)

    0 讨论(0)
  • 2021-01-16 08:20

    You can also use a negative lookahead:

    df <- tibble(input = c("foo - bar", "hey-now-man", "say-now-girl", "fine-now"))
    
    df %>% 
        separate(input, into = c("output1", "output2"), sep = "\\-(?!.*-)", remove = FALSE)
    

    Refs:

    [1] https://frightanic.com/software-development/regex-match-last-occurrence/

    [2] https://www.regular-expressions.info/lookaround.html

    0 讨论(0)
提交回复
热议问题