问题
I got a dataset with a list of keywords (1 keyword / row).
- I m looking for a way to create a new column (ALPHABETICAL) based on the KEYWORD column. The value of the ALPHABETICAL column should be auto generated based on the keyword, but words should be ordered alphabetically.
Like this :
| KEYWORD | ALPHABETICAL |
| house blue | blue house |
| blue house | blue house |
| my blue house | blue house my |
| this house is blue | blue house is this |
| sky orange | orange sky |
| orange sky | orange sky |
| the orange sky | orange sky the |
Thanks for your help!
回答1:
One solution with dplyr + stringr
library(dplyr)
library(stringr)
KEYWORDS <- c('house blue','blue house','my blue house','this house is blue','sky orange','orange sky','the orange sky')
ALPHABETICAL <- KEYWORDS %>% str_split(., ' ') %>% lapply(., 'sort') %>% lapply(., 'paste', collapse=' ') %>% unlist(.)
The last line uses str_split() to split the KEYWORDS into a list of vectors; sort is then applied to each list element; the vectors are concatenated using paste, and finally the list is broken into a vector.
The result is
> cbind(KEYWORDS, ALPHABETICAL)
KEYWORDS ALPHABETICAL
[1,] "house blue" "blue house"
[2,] "blue house" "blue house"
[3,] "my blue house" "blue house my"
[4,] "this house is blue" "blue house is this"
[5,] "sky orange" "orange sky"
[6,] "orange sky" "orange sky"
[7,] "the orange sky" "orange sky the"
回答2:
Iterate over rows to split by " "
(strsplit
), sort and collapse back:
# Generate data
df <- data.frame(KEYWORD = c(paste(sample(letters, 3), collapse = " "),
paste(sample(letters, 3), collapse = " ")))
# KEYWORD
# z e s
# d a u
df$ALPHABETICAL <- apply(df, 1, function(x) paste(sort(unlist(strsplit(x, " "))),
collapse = " "))
# KEYWORD ALPHABETICAL
# z e s e s z
# d a u a d u
回答3:
df$ALPHABETICAL <- sapply(strsplit(df$KEYWORD," "),function(x) paste(sort(x),collapse=" "))
df
# KEYWORD ALPHABETICAL
# 1 house blue blue house
# 2 blue house blue house
# 3 my blue house blue house my
# 4 this house is blue blue house is this
# 5 sky orange orange sky
# 6 orange sky orange sky
# 7 the orange sky orange sky the
data
df <- data.frame(KEYWORD = c(
'house blue',
'blue house',
'my blue house',
'this house is blue',
'sky orange',
'orange sky',
'the orange sky'),stringsAsFactors = FALSE)
来源:https://stackoverflow.com/questions/47304462/how-to-change-the-order-of-words-with-alphabetic-order