Dictionary style replace multiple items

前端 未结 10 608
太阳男子
太阳男子 2020-11-22 05:02

I have a large data.frame of character data that I want to convert based on what is commonly called a dictionary in other languages.

Currently I am going about it li

相关标签:
10条回答
  • 2020-11-22 05:21

    We can also use dplyr::case_when

    library(dplyr)
    
    foo %>%
       mutate_all(~case_when(. == "AA" ~ "0101", 
                             . == "AC" ~ "0102", 
                             . == "AG" ~ "0103", 
                             TRUE ~ .))
    
    #  snp1 snp2 snp3
    #1 0101 0101 <NA>
    #2 0103   AT   GG
    #3 0101 0103   GG
    #4 0101 0101   GC
    

    It checks the condition and replaces with the corresponding value if the condition is TRUE. We can add more conditions if needed and with TRUE ~ . we keep the values as it is if none of the condition is matched. If we want to change them to NA instead we can remove the last line.

    foo %>%
      mutate_all(~case_when(. == "AA" ~ "0101", 
                            . == "AC" ~ "0102", 
                            . == "AG" ~ "0103"))
    
    #  snp1 snp2 snp3
    #1 0101 0101 <NA>
    #2 0103 <NA> <NA>
    #3 0101 0103 <NA>
    #4 0101 0101 <NA>
    

    This will change the values to NA if none of the above condition is satisfied.


    Another option using only base R is to create a lookup dataframe with old and new values, unlist the dataframe, match them with old values, get the corresponding new values and replace.

    lookup <- data.frame(old_val = c("AA", "AC", "AG"), 
                         new_val = c("0101", "0102", "0103"))
    
    foo[] <- lookup$new_val[match(unlist(foo), lookup$old_val)]
    
    0 讨论(0)
  • 2020-11-22 05:22

    One of the most readable way to replace value in a string or a vector of string with a dictionary is stringr::str_replace_all, from the stringr package. The pattern needed by str_replace_all can be a dictionnary, e.g.,

    # 1. Made your dictionnary
    dictio_replace= c("AA"= "0101", 
                      "AC"= "0102",
                      "AG"= "0103") # short example of dictionnary.
    
     # 2. Replace all pattern, according to the dictionary-values (only a single vector of string, or a single string)
     foo$snp1 <- stringr::str_replace_all(string = foo$snp1,
                                          pattern= dictio_replace)  # we only use the 'pattern' option here: 'replacement' is useless since we provide a dictionnary.
    

    Repeat step 2 with foo$snp2 & foo$snp3. If you have more vectors to transform it's a good idea to use another func', in order to replace values in each of the columns/vector in the dataframe without repeating yourself.

    0 讨论(0)
  • 2020-11-22 05:24

    If you're open to using packages, plyr is a very popular one and has this handy mapvalues() function that will do just what you're looking for:

    foo <- mapvalues(foo, from=c("AA", "AC", "AG"), to=c("0101", "0102", "0103"))
    

    Note that it works for data types of all kinds, not just strings.

    0 讨论(0)
  • 2020-11-22 05:27

    Here is a quick solution

    dict = list(AA = '0101', AC = '0102', AG = '0103')
    foo2 = foo
    for (i in 1:3){foo2 <- replace(foo2, foo2 == names(dict[i]), dict[i])}
    
    0 讨论(0)
  • 2020-11-22 05:32

    Here's something simple that will do the job:

    key <- c('AA','AC','AG')
    val <- c('0101','0102','0103')
    
    lapply(1:3,FUN = function(i){foo[foo == key[i]] <<- val[i]})
    foo
    
     snp1 snp2 snp3
    1 0101 0101 <NA>
    2 0103   AT   GG
    3 0101 0103   GG
    4 0101 0101   GC
    

    lapply will output a list in this case that we don't actually care about. You could assign the result to something if you like and then just discard it. I'm iterating over the indices here, but you could just as easily place the key/vals in a list themselves and iterate over them directly. Note the use of global assignment with <<-.

    I tinkered with a way to do this with mapply but my first attempt didn't work, so I switched. I suspect a solution with mapply is possible, though.

    0 讨论(0)
  • 2020-11-22 05:32

    Since it's been a few years since the last answer, and a new question came up tonight on this topic and a moderator closed it, I'll add it here. The poster has a large data frame containing 0, 1, and 2, and wants to change them to AA, AB, and BB respectively.

    Use plyr:

    > df <- data.frame(matrix(sample(c(NA, c("0","1","2")), 100, replace = TRUE), 10))
    > df
         X1   X2   X3 X4   X5   X6   X7   X8   X9  X10
    1     1    2 <NA>  2    1    2    0    2    0    2
    2     0    2    1  1    2    1    1    0    0    1
    3     1    0    2  2    1    0 <NA>    0    1 <NA>
    4     1    2 <NA>  2    2    2    1    1    0    1
    ... to 10th row
    
    > df[] <- lapply(df, as.character)
    

    Create a function over the data frame using revalue to replace multiple terms:

    > library(plyr)
    > apply(df, 2, function(x) {x <- revalue(x, c("0"="AA","1"="AB","2"="BB")); x})
          X1   X2   X3   X4   X5   X6   X7   X8   X9   X10 
     [1,] "AB" "BB" NA   "BB" "AB" "BB" "AA" "BB" "AA" "BB"
     [2,] "AA" "BB" "AB" "AB" "BB" "AB" "AB" "AA" "AA" "AB"
     [3,] "AB" "AA" "BB" "BB" "AB" "AA" NA   "AA" "AB" NA  
     [4,] "AB" "BB" NA   "BB" "BB" "BB" "AB" "AB" "AA" "AB"
    ... and so on
    
    0 讨论(0)
提交回复
热议问题