I would like to split strings on the first and last comma. Each string has at least two commas. Below is an example data set and the desired result.
A similar ques
Using str_match()
from package stringr
, and a little help from one of your links,
> library(stringr)
> data.frame(str_match(my.data$my.string, "(.+?),(.*),(.+?)$")[,-1],
some.data = my.data$some.data)
# X1 X2 X3 some.data
# 1 123 34,56,78 90 10
# 2 87 65,43 21 20
# 3 a4 b6 c8888 30
# 4 11 bbbb ccccc 40
# 5 uu vv,ww xx 50
# 6 j k,l,m,n,o p 60
Here is a relatively simple approach. In the first line we use sub
to replace the first and last commas with semicolons producing s
. Then we read s
using sep=";"
and finally cbind
the rest of my.data
to it:
s <- sub(",(.*),", ";\\1;", my.data[[1]])
DF <- read.table(text=s, sep =";", col.names=paste0("mystring",1:3), as.is=TRUE)
cbind(DF, my.data[-1])
giving:
mystring1 mystring2 mystring3 some.data
1 123 34,56,78 90 10
2 87 65,43 21 20
3 a4 b6 c8888 30
4 11 bbbb ccccc 40
5 uu vv,ww xx 50
6 j k,l,m,n,o p 60
You can use the \K
operator which keeps text already matched out of the result and a negative look ahead assertion to do this (well almost, there is an annoying comma at the start of the middle portion which I am yet to get rid of in the strsplit
). But I enjoyed this as an exercise in constructing a regex...
x <- '123,34,56,78,90'
strsplit( x , "^[^,]+\\K|,(?=[^,]+$)" , perl = TRUE )
#[[1]]
#[1] "123" ",34,56,78" "90"
^[^,]+
: from the start of the string match one or more characters that are not a ,
\\K
: but don't include those matched characters in the match|
: or you can match...,(?=[^,]+$)
: a ,
so long as it is followed by [(?=...)
] one or more characters that are not a ,
until the end of the string ($
)...Here is code to split on the first and last comma. This code draws heavily from an answer by @bdemarest here: Split string on first two colons The gsub
pattern below, which is the meat of the answer, contains important differences. The code for creating the new data frame after strings are split is the same as that of @bdemarest
# Replace first and last commas with colons.
new.string <- gsub(pattern="(^[^,]+),(.+),([^,]+$)",
replacement="\\1:\\2:\\3", x=my.data$my.string)
new.string
# Split on colons
split.data <- strsplit(new.string, ":")
# Create data frame
new.data <- data.frame(do.call(rbind, split.data))
names(new.data) <- paste("my.string", seq(ncol(new.data)), sep="")
my.data$my.string <- NULL
my.data <- cbind(new.data, my.data)
my.data
# my.string1 my.string2 my.string3 some.data
# 1 123 34,56,78 90 10
# 2 87 65,43 21 20
# 3 a4 b6 c8888 30
# 4 11 bbbb ccccc 40
# 5 uu vv,ww xx 50
# 6 j k,l,m,n,o p 60
# Here is code for splitting strings on the first comma
my.data <- read.table(text='
my.string some.data
123,34,56,78,90 10
87,65,43,21 20
a4,b6,c8888 30
11,bbbb,ccccc 40
uu,vv,ww,xx 50
j,k,l,m,n,o,p 60', header = TRUE, stringsAsFactors=FALSE)
# Replace first comma with colon
new.string <- gsub(pattern="(^[^,]+),(.+$)",
replacement="\\1:\\2", x=my.data$my.string)
new.string
# Split on colon
split.data <- strsplit(new.string, ":")
# Create data frame
new.data <- data.frame(do.call(rbind, split.data))
names(new.data) <- paste("my.string", seq(ncol(new.data)), sep="")
my.data$my.string <- NULL
my.data <- cbind(new.data, my.data)
my.data
# my.string1 my.string2 some.data
# 1 123 34,56,78,90 10
# 2 87 65,43,21 20
# 3 a4 b6,c8888 30
# 4 11 bbbb,ccccc 40
# 5 uu vv,ww,xx 50
# 6 j k,l,m,n,o,p 60
# Here is code for splitting strings on the last comma
my.data <- read.table(text='
my.string some.data
123,34,56,78,90 10
87,65,43,21 20
a4,b6,c8888 30
11,bbbb,ccccc 40
uu,vv,ww,xx 50
j,k,l,m,n,o,p 60', header = TRUE, stringsAsFactors=FALSE)
# Replace last comma with colon
new.string <- gsub(pattern="^(.+),([^,]+$)",
replacement="\\1:\\2", x=my.data$my.string)
new.string
# Split on colon
split.data <- strsplit(new.string, ":")
# Create new data frame
new.data <- data.frame(do.call(rbind, split.data))
names(new.data) <- paste("my.string", seq(ncol(new.data)), sep="")
my.data$my.string <- NULL
my.data <- cbind(new.data, my.data)
my.data
# my.string1 my.string2 some.data
# 1 123,34,56,78 90 10
# 2 87,65,43 21 20
# 3 a4,b6 c8888 30
# 4 11,bbbb ccccc 40
# 5 uu,vv,ww xx 50
# 6 j,k,l,m,n,o p 60
You can do a simple strsplit
here on that column
popshift<-sapply(strsplit(my.data$my.string,","), function(x)
c(x[1], paste(x[2:(length(x)-1)],collapse=","), x[length(x)]))
desired.result <- cbind(data.frame(my.string=t(popshift)), my.data[-1])
I just split up all the values and make a new vector with the first, last and middle strings. Then i cbind that with the rest of the data. The result is
my.string.1 my.string.2 my.string.3 some.data
1 123 34,56,78 90 10
2 87 65,43 21 20
3 a4 b6 c8888 30
4 11 bbbb ccccc 40
5 uu vv,ww xx 50
6 j k,l,m,n,o p 60