Dplyr Mutate_each for paired sets of columns

后端 未结 2 821
陌清茗
陌清茗 2021-01-14 23:07

Is there a way to achieve the following transformation using dplyr::mutate_each?

data.frame(x1 = 1:5, x2 = 6:10, y1 = rnorm(5), y2 = rnorm(5)) %>%
  mutat         


        
相关标签:
2条回答
  • 2021-01-14 23:39

    This does not use mutate_each, nor is it very pretty, nor do I think it will be very fast, but:

    #create data set
    p<-data.frame(x1 = 1:5, x2 = 6:10,
              y1 = rnorm(5), y2 = rnorm(5),
              z1 = 11:15, z2 = rnorm(5),
              w1 = rchisq(5,2), w2 = rgamma(5, .2)) 
    
    #subset the columns by their column number and subtract them
    p[,ncol(p)+seq(1,ncol(p)/2, by = 1)]<-
    p[,seq(1,ncol(p),by = 2)]-p[,seq(2,ncol(p), by = 2)]
    

    The data.frame p should be updated with half as many columns as it originally had, the new columns containing the difference of each pair (1-2, 3-4, 5-6) of originals.

    0 讨论(0)
  • 2021-01-14 23:48

    As per mentionned by @Gregor in the comments, if you want to work with dplyr, it would be better to get your data in a tidy format. Here's an idea:

    library(dplyr)
    library(tidyr)
    
    df %>%
      add_rownames() %>%
      gather(key, val, -rowname) %>%
      separate(key, c("var", "num"), "(?<=[a-z]) ?(?=[0-9])") %>%
      spread(var, val) %>%
      mutate(diff = x - y) 
    

    Which gives:

    #Source: local data frame [10 x 5]
    #
    #   rowname   num     x           y        diff
    #     (chr) (chr) (dbl)       (dbl)       (dbl)
    #1        1     1     1  1.03645018 -0.03645018
    #2        1     2     6 -0.86020990  6.86020990
    #3        2     1     2 -1.10790835  3.10790835
    #4        2     2     7  1.69128750  5.30871250
    #5        3     1     3  0.95452119  2.04547881
    #6        3     2     8  2.72326570  5.27673430
    #7        4     1     4  0.01370762  3.98629238
    #8        4     2     9  1.63857650  7.36142350
    #9        5     1     5  0.19354354  4.80645646
    #10       5     2    10 -1.04643600 11.04643600
    

    If for some reason you still want the data in wide format after performing the operation, you could add to the pipe:

      gather(key, value, -(rowname:num)) %>%
      unite(key_num, key, num, sep = "") %>%
      spread(key_num, value)
    

    Which would give:

    #Source: local data frame [5 x 7]
    #
    #  rowname       diff1     diff2    x1    x2          y1         y2
    #    (chr)       (dbl)     (dbl) (dbl) (dbl)       (dbl)      (dbl)
    #1       1 -0.03645018  6.860210     1     6  1.03645018 -0.8602099
    #2       2  3.10790835  5.308713     2     7 -1.10790835  1.6912875
    #3       3  2.04547881  5.276734     3     8  0.95452119  2.7232657
    #4       4  3.98629238  7.361423     4     9  0.01370762  1.6385765
    #5       5  4.80645646 11.046436     5    10  0.19354354 -1.0464360
    

    Data

    df <- structure(list(x1 = 1:5, x2 = 6:10, y1 = c(1.03645018, -1.10790835, 
    0.95452119, 0.01370762, 0.19354354), y2 = c(-0.8602099, 1.6912875, 
    2.7232657, 1.6385765, -1.046436)), .Names = c("x1", "x2", "y1", 
    "y2"), class = "data.frame", row.names = c("1", "2", "3", "4", "5"))
    
    0 讨论(0)
提交回复
热议问题