Suppose I have a dataframe like this:
df<-data.frame(a=c(\"AA\",\"BB\"),b=c(\"short string\",\"this is the longer string\"))
I would like to
You may turn the [^ ]*$
part of your regex into a (?=[^ ]*$)
non-consuming pattern, a positive lookahead (that will not consume the non-whitespace chars at the end of the string, i.e. they won't be put into the match value and thus will stay there in the output):
df%>%
separate(b,c("partA","partB"),sep=" (?=[^ ]*$)")
Or, a bit more universal since it matches any whitespace chars:
df %>%
separate(b,c("partA","partB"),sep="\\s+(?=\\S*$)")
See the regex demo and its graph below:
Output:
a partA partB
1 AA short string
2 BB this is the longer string
We can use extract
from tidyr
by using the capture groups ((...)
). We match zero or more characters (.*
) and place it within the parentheses ((.*)
), followed by zero or more space (\\s+
), followed by the next capture group which includes only characters that are not a space ([^ ]
) until the end ($
) of the string.
library(tidyr)
extract(df, b, into = c('partA', 'partB'), '(.*)\\s+([^ ]+)$')
# a partA partB
#1 AA short string
#2 BB this is the longer string