I got a dataset with a list of keywords (1 keyword / row).
Iterate over rows to split by " "
(strsplit
), sort and collapse back:
# Generate data
df <- data.frame(KEYWORD = c(paste(sample(letters, 3), collapse = " "),
paste(sample(letters, 3), collapse = " ")))
# KEYWORD
# z e s
# d a u
df$ALPHABETICAL <- apply(df, 1, function(x) paste(sort(unlist(strsplit(x, " "))),
collapse = " "))
# KEYWORD ALPHABETICAL
# z e s e s z
# d a u a d u
df$ALPHABETICAL <- sapply(strsplit(df$KEYWORD," "),function(x) paste(sort(x),collapse=" "))
df
# KEYWORD ALPHABETICAL
# 1 house blue blue house
# 2 blue house blue house
# 3 my blue house blue house my
# 4 this house is blue blue house is this
# 5 sky orange orange sky
# 6 orange sky orange sky
# 7 the orange sky orange sky the
data
df <- data.frame(KEYWORD = c(
'house blue',
'blue house',
'my blue house',
'this house is blue',
'sky orange',
'orange sky',
'the orange sky'),stringsAsFactors = FALSE)
One solution with dplyr + stringr
library(dplyr)
library(stringr)
KEYWORDS <- c('house blue','blue house','my blue house','this house is blue','sky orange','orange sky','the orange sky')
ALPHABETICAL <- KEYWORDS %>% str_split(., ' ') %>% lapply(., 'sort') %>% lapply(., 'paste', collapse=' ') %>% unlist(.)
The last line uses str_split() to split the KEYWORDS into a list of vectors; sort is then applied to each list element; the vectors are concatenated using paste, and finally the list is broken into a vector.
The result is
> cbind(KEYWORDS, ALPHABETICAL)
KEYWORDS ALPHABETICAL
[1,] "house blue" "blue house"
[2,] "blue house" "blue house"
[3,] "my blue house" "blue house my"
[4,] "this house is blue" "blue house is this"
[5,] "sky orange" "orange sky"
[6,] "orange sky" "orange sky"
[7,] "the orange sky" "orange sky the"