How to change the order of words with alphabetic order

前端 未结 3 1761
误落风尘
误落风尘 2021-01-20 09:59

I got a dataset with a list of keywords (1 keyword / row).

  1. I m looking for a way to create a new column (ALPHABETICAL) based on the KEYWORD column. The value o
相关标签:
3条回答
  • 2021-01-20 10:41

    Iterate over rows to split by " "(strsplit), sort and collapse back:

    # Generate data
    df <- data.frame(KEYWORD = c(paste(sample(letters, 3), collapse = " "), 
                                 paste(sample(letters, 3), collapse = " ")))
    #  KEYWORD
    #   z e s
    #   d a u
    
    df$ALPHABETICAL  <- apply(df, 1, function(x) paste(sort(unlist(strsplit(x, " "))),
                                                       collapse = " "))
    #  KEYWORD ALPHABETICAL
    #   z e s        e s z
    #   d a u        a d u
    
    0 讨论(0)
  • 2021-01-20 10:44
    df$ALPHABETICAL <- sapply(strsplit(df$KEYWORD," "),function(x) paste(sort(x),collapse=" "))
    
    df
    #              KEYWORD       ALPHABETICAL
    # 1         house blue         blue house
    # 2         blue house         blue house
    # 3      my blue house      blue house my
    # 4 this house is blue blue house is this
    # 5         sky orange         orange sky
    # 6         orange sky         orange sky
    # 7     the orange sky     orange sky the
    

    data

    df <- data.frame(KEYWORD = c(
      'house blue',
      'blue house',
      'my blue house',
      'this house is blue',
      'sky orange',
      'orange sky',
      'the orange sky'),stringsAsFactors = FALSE)  
    
    0 讨论(0)
  • 2021-01-20 10:49

    One solution with dplyr + stringr

    library(dplyr)
    library(stringr)
    KEYWORDS  <- c('house blue','blue house','my blue house','this house is blue','sky orange','orange sky','the orange sky')
    
    ALPHABETICAL <- KEYWORDS %>% str_split(., ' ') %>% lapply(., 'sort') %>%  lapply(., 'paste', collapse=' ') %>% unlist(.)
    

    The last line uses str_split() to split the KEYWORDS into a list of vectors; sort is then applied to each list element; the vectors are concatenated using paste, and finally the list is broken into a vector.

    The result is

    > cbind(KEYWORDS, ALPHABETICAL)
         KEYWORDS             ALPHABETICAL        
    [1,] "house blue"         "blue house"        
    [2,] "blue house"         "blue house"        
    [3,] "my blue house"      "blue house my"     
    [4,] "this house is blue" "blue house is this"
    [5,] "sky orange"         "orange sky"        
    [6,] "orange sky"         "orange sky"        
    [7,] "the orange sky"     "orange sky the" 
    
    0 讨论(0)
提交回复
热议问题