split strings on first and last commas

后端 未结 5 1930
生来不讨喜
生来不讨喜 2021-01-13 16:05

I would like to split strings on the first and last comma. Each string has at least two commas. Below is an example data set and the desired result.

A similar ques

相关标签:
5条回答
  • 2021-01-13 16:32

    Using str_match() from package stringr, and a little help from one of your links,

    > library(stringr)
    > data.frame(str_match(my.data$my.string, "(.+?),(.*),(.+?)$")[,-1], 
                 some.data = my.data$some.data)
    #    X1        X2    X3 some.data
    # 1 123  34,56,78    90        10
    # 2  87     65,43    21        20
    # 3  a4        b6 c8888        30
    # 4  11      bbbb ccccc        40
    # 5  uu     vv,ww    xx        50
    # 6   j k,l,m,n,o     p        60
    
    0 讨论(0)
  • 2021-01-13 16:38

    Here is a relatively simple approach. In the first line we use sub to replace the first and last commas with semicolons producing s. Then we read s using sep=";" and finally cbind the rest of my.data to it:

    s <- sub(",(.*),", ";\\1;", my.data[[1]])
    DF <- read.table(text=s, sep =";", col.names=paste0("mystring",1:3), as.is=TRUE)
    cbind(DF, my.data[-1])
    

    giving:

      mystring1 mystring2 mystring3 some.data
    1       123  34,56,78        90        10
    2        87     65,43        21        20
    3        a4        b6     c8888        30
    4        11      bbbb     ccccc        40
    5        uu     vv,ww        xx        50
    6         j k,l,m,n,o         p        60
    
    0 讨论(0)
  • 2021-01-13 16:38

    You can use the \K operator which keeps text already matched out of the result and a negative look ahead assertion to do this (well almost, there is an annoying comma at the start of the middle portion which I am yet to get rid of in the strsplit). But I enjoyed this as an exercise in constructing a regex...

    x <- '123,34,56,78,90'
    strsplit( x , "^[^,]+\\K|,(?=[^,]+$)" , perl = TRUE )
    #[[1]]
    #[1] "123"       ",34,56,78" "90"
    

    Explantion:

    • ^[^,]+ : from the start of the string match one or more characters that are not a ,
    • \\K : but don't include those matched characters in the match
    • So the first match is the first comma...
    • | : or you can match...
    • ,(?=[^,]+$) : a , so long as it is followed by [(?=...)] one or more characters that are not a , until the end of the string ($)...
    0 讨论(0)
  • 2021-01-13 16:38

    Here is code to split on the first and last comma. This code draws heavily from an answer by @bdemarest here: Split string on first two colons The gsub pattern below, which is the meat of the answer, contains important differences. The code for creating the new data frame after strings are split is the same as that of @bdemarest

    # Replace first and last commas with colons.
    
    new.string <- gsub(pattern="(^[^,]+),(.+),([^,]+$)", 
                  replacement="\\1:\\2:\\3", x=my.data$my.string)
    new.string
    
    # Split on colons
    split.data <- strsplit(new.string, ":")
    
    # Create data frame
    new.data <- data.frame(do.call(rbind, split.data))
    names(new.data) <- paste("my.string", seq(ncol(new.data)), sep="")
    
    my.data$my.string <- NULL
    my.data <- cbind(new.data, my.data)
    my.data
    
    #   my.string1 my.string2 my.string3 some.data
    # 1        123   34,56,78         90        10
    # 2         87      65,43         21        20
    # 3         a4         b6      c8888        30
    # 4         11       bbbb      ccccc        40
    # 5         uu      vv,ww         xx        50
    # 6          j  k,l,m,n,o          p        60
    
    
    
    # Here is code for splitting strings on the first comma
    
    my.data <- read.table(text='
    
    my.string        some.data
    123,34,56,78,90     10
    87,65,43,21         20
    a4,b6,c8888         30
    11,bbbb,ccccc       40
    uu,vv,ww,xx         50
    j,k,l,m,n,o,p       60', header = TRUE, stringsAsFactors=FALSE)
    
    
    # Replace first comma with colon
    
    new.string <- gsub(pattern="(^[^,]+),(.+$)", 
                       replacement="\\1:\\2", x=my.data$my.string)
    new.string
    
    # Split on colon
    split.data <- strsplit(new.string, ":")
    
    # Create data frame
    new.data <- data.frame(do.call(rbind, split.data))
    names(new.data) <- paste("my.string", seq(ncol(new.data)), sep="")
    
    my.data$my.string <- NULL
    my.data <- cbind(new.data, my.data)
    my.data
    
    #   my.string1  my.string2 some.data
    # 1        123 34,56,78,90        10
    # 2         87    65,43,21        20
    # 3         a4    b6,c8888        30
    # 4         11  bbbb,ccccc        40
    # 5         uu    vv,ww,xx        50
    # 6          j k,l,m,n,o,p        60
    
    
    
    
    # Here is code for splitting strings on the last comma
    
    my.data <- read.table(text='
    
    my.string        some.data
    123,34,56,78,90     10
    87,65,43,21         20
    a4,b6,c8888         30
    11,bbbb,ccccc       40
    uu,vv,ww,xx         50
    j,k,l,m,n,o,p       60', header = TRUE, stringsAsFactors=FALSE)
    
    
    # Replace last comma with colon
    
    new.string <- gsub(pattern="^(.+),([^,]+$)", 
                       replacement="\\1:\\2", x=my.data$my.string)
    new.string
    
    # Split on colon
    split.data <- strsplit(new.string, ":")
    
    # Create new data frame
    new.data <- data.frame(do.call(rbind, split.data))
    names(new.data) <- paste("my.string", seq(ncol(new.data)), sep="")
    
    my.data$my.string <- NULL
    my.data <- cbind(new.data, my.data)
    my.data
    
    #     my.string1 my.string2 some.data
    # 1 123,34,56,78         90        10
    # 2     87,65,43         21        20
    # 3        a4,b6      c8888        30
    # 4      11,bbbb      ccccc        40
    # 5     uu,vv,ww         xx        50
    # 6  j,k,l,m,n,o          p        60
    
    0 讨论(0)
  • 2021-01-13 16:39

    You can do a simple strsplit here on that column

    popshift<-sapply(strsplit(my.data$my.string,","), function(x) 
        c(x[1], paste(x[2:(length(x)-1)],collapse=","), x[length(x)]))
    
    desired.result <- cbind(data.frame(my.string=t(popshift)), my.data[-1])
    

    I just split up all the values and make a new vector with the first, last and middle strings. Then i cbind that with the rest of the data. The result is

      my.string.1 my.string.2 my.string.3 some.data
    1         123    34,56,78          90        10
    2          87       65,43          21        20
    3          a4          b6       c8888        30
    4          11        bbbb       ccccc        40
    5          uu       vv,ww          xx        50
    6           j   k,l,m,n,o           p        60
    
    0 讨论(0)
提交回复
热议问题