问题
I'd like to separate each letter or symbol in a string for composing a new data.frame
with dimension equals the number of letters. I tried to use the function separate
from tidyr
package, but the result is not desired.
df <- data.frame(x = c('house', 'mouse'), y = c('count', 'apple'), stringsAsFactors = F)
unexpected result
df[1, ] %>% separate(x, c('A1', 'A2', 'A3', 'A4', 'A5'), sep ='')
A1 A2 A3 A4 A5 y
1 <NA> <NA> <NA> <NA> <NA> count
Expected output
A1 A2 A3 A4 A5
h o u s e
m o u s e
Solutions using stringr
are welcome.
回答1:
We can use regex lookaround in sep
to match the boundary between each character
library(dplyr)
library(tidyr)
library(stringr)
df %>%
select(x) %>%
separate(x, into = str_c("A", 1:5), sep= "(?<=[a-z])(?=[a-z])")
# A1 A2 A3 A4 A5
#1 h o u s e
#2 m o u s e
回答2:
A solution in base
would be:
do.call(rbind , sapply(df$x, function(col) strsplit(col, "")))
# [,1] [,2] [,3] [,4] [,5]
# house "h" "o" "u" "s" "e"
# mouse "m" "o" "u" "s" "e"
回答3:
We can use cSplit
from splitstackshape
with stripWhite = FALSE
and sep = ""
to split every letter in a column.
splitstackshape::cSplit(df, "x", sep = "", stripWhite = FALSE)
# y x_1 x_2 x_3 x_4 x_5
#1: count h o u s e
#2: apple m o u s e
来源:https://stackoverflow.com/questions/59166057/separate-string-into-many-columns