split strings on first and last commas

后端未结

关注

 5  1936

I would like to split strings on the first and last comma. Each string has at least two commas. Below is an example data set and the desired result.

A similar ques

相关标签:

5条回答

醉酒成梦

2021-01-13 16:32

Using str_match() from package stringr, and a little help from one of your links,

> library(stringr)
> data.frame(str_match(my.data$my.string, "(.+?),(.*),(.+?)$")[,-1], 
             some.data = my.data$some.data)
#    X1        X2    X3 some.data
# 1 123  34,56,78    90        10
# 2  87     65,43    21        20
# 3  a4        b6 c8888        30
# 4  11      bbbb ccccc        40
# 5  uu     vv,ww    xx        50
# 6   j k,l,m,n,o     p        60

0 讨论(0)

旧巷少年郎

2021-01-13 16:38

Here is a relatively simple approach. In the first line we use sub to replace the first and last commas with semicolons producing s. Then we read s using sep=";" and finally cbind the rest of my.data to it:

s <- sub(",(.*),", ";\\1;", my.data[[1]])
DF <- read.table(text=s, sep =";", col.names=paste0("mystring",1:3), as.is=TRUE)
cbind(DF, my.data[-1])

giving:

  mystring1 mystring2 mystring3 some.data
1       123  34,56,78        90        10
2        87     65,43        21        20
3        a4        b6     c8888        30
4        11      bbbb     ccccc        40
5        uu     vv,ww        xx        50
6         j k,l,m,n,o         p        60

0 讨论(0)

孤独总比滥情好

2021-01-13 16:38
You can use the \K operator which keeps text already matched out of the result and a negative look ahead assertion to do this (well almost, there is an annoying comma at the start of the middle portion which I am yet to get rid of in the strsplit). But I enjoyed this as an exercise in constructing a regex...
```
x <- '123,34,56,78,90'
strsplit( x , "^[^,]+\\K|,(?=[^,]+$)" , perl = TRUE )
#[[1]]
#[1] "123"       ",34,56,78" "90"
```
Explantion:
- ^[^,]+ : from the start of the string match one or more characters that are not a ,
- \\K : but don't include those matched characters in the match
- So the first match is the first comma...
- | : or you can match...
- ,(?=[^,]+$) : a , so long as it is followed by [(?=...)] one or more characters that are not a , until the end of the string ($)...
0 讨论(0)
发布评论:

提交评论
- 加载中...

长发绾君心

2021-01-13 16:38

Here is code to split on the first and last comma. This code draws heavily from an answer by @bdemarest here: Split string on first two colons The gsub pattern below, which is the meat of the answer, contains important differences. The code for creating the new data frame after strings are split is the same as that of @bdemarest

# Replace first and last commas with colons.

new.string <- gsub(pattern="(^[^,]+),(.+),([^,]+$)", 
              replacement="\\1:\\2:\\3", x=my.data$my.string)
new.string

# Split on colons
split.data <- strsplit(new.string, ":")

# Create data frame
new.data <- data.frame(do.call(rbind, split.data))
names(new.data) <- paste("my.string", seq(ncol(new.data)), sep="")

my.data$my.string <- NULL
my.data <- cbind(new.data, my.data)
my.data

#   my.string1 my.string2 my.string3 some.data
# 1        123   34,56,78         90        10
# 2         87      65,43         21        20
# 3         a4         b6      c8888        30
# 4         11       bbbb      ccccc        40
# 5         uu      vv,ww         xx        50
# 6          j  k,l,m,n,o          p        60



# Here is code for splitting strings on the first comma

my.data <- read.table(text='

my.string        some.data
123,34,56,78,90     10
87,65,43,21         20
a4,b6,c8888         30
11,bbbb,ccccc       40
uu,vv,ww,xx         50
j,k,l,m,n,o,p       60', header = TRUE, stringsAsFactors=FALSE)


# Replace first comma with colon

new.string <- gsub(pattern="(^[^,]+),(.+$)", 
                   replacement="\\1:\\2", x=my.data$my.string)
new.string

# Split on colon
split.data <- strsplit(new.string, ":")

# Create data frame
new.data <- data.frame(do.call(rbind, split.data))
names(new.data) <- paste("my.string", seq(ncol(new.data)), sep="")

my.data$my.string <- NULL
my.data <- cbind(new.data, my.data)
my.data

#   my.string1  my.string2 some.data
# 1        123 34,56,78,90        10
# 2         87    65,43,21        20
# 3         a4    b6,c8888        30
# 4         11  bbbb,ccccc        40
# 5         uu    vv,ww,xx        50
# 6          j k,l,m,n,o,p        60




# Here is code for splitting strings on the last comma

my.data <- read.table(text='

my.string        some.data
123,34,56,78,90     10
87,65,43,21         20
a4,b6,c8888         30
11,bbbb,ccccc       40
uu,vv,ww,xx         50
j,k,l,m,n,o,p       60', header = TRUE, stringsAsFactors=FALSE)


# Replace last comma with colon

new.string <- gsub(pattern="^(.+),([^,]+$)", 
                   replacement="\\1:\\2", x=my.data$my.string)
new.string

# Split on colon
split.data <- strsplit(new.string, ":")

# Create new data frame
new.data <- data.frame(do.call(rbind, split.data))
names(new.data) <- paste("my.string", seq(ncol(new.data)), sep="")

my.data$my.string <- NULL
my.data <- cbind(new.data, my.data)
my.data

#     my.string1 my.string2 some.data
# 1 123,34,56,78         90        10
# 2     87,65,43         21        20
# 3        a4,b6      c8888        30
# 4      11,bbbb      ccccc        40
# 5     uu,vv,ww         xx        50
# 6  j,k,l,m,n,o          p        60

0 讨论(0)

时光说笑

2021-01-13 16:39

You can do a simple strsplit here on that column

popshift<-sapply(strsplit(my.data$my.string,","), function(x) 
    c(x[1], paste(x[2:(length(x)-1)],collapse=","), x[length(x)]))

desired.result <- cbind(data.frame(my.string=t(popshift)), my.data[-1])

I just split up all the values and make a new vector with the first, last and middle strings. Then i cbind that with the rest of the data. The result is

  my.string.1 my.string.2 my.string.3 some.data
1         123    34,56,78          90        10
2          87       65,43          21        20
3          a4          b6       c8888        30
4          11        bbbb       ccccc        40
5          uu       vv,ww          xx        50
6           j   k,l,m,n,o           p        60

0 讨论(0)

split strings on first and last commas

Explantion: