Mutating multiple columns in a data frame using dplyr

后端 未结 4 2020
一个人的身影
一个人的身影 2020-12-09 05:24

I have the following data frame df:

  v1 v2 v3 v4
1  1  5  7  4
2  2  6 10  3

And I want to obtain the following data frame

相关标签:
4条回答
  • 2020-12-09 05:36

    You are really close.

    df2 <- 
        df %>% 
        mutate(v1v3 = v1 * v3,
               v2v4 = v2 * v4)
    

    such a beautifully simple language, right?

    For more great tricks please see here.

    EDIT: Thanks to @Facottons pointer to this answer: https://stackoverflow.com/a/34377242/5088194, here is a tidy approach to resolving this issue. It keeps one from having to write a line to hard code in each new column desired. While it is a bit more verbose than the Base R approach, the logic is at least more immediately transparent/readable. It is also worth noting that there must be at least half as many rows as there are columns for this approach to work.

    # prep the product column names (also acting as row numbers)
    df <- 
        df %>%
        mutate(prod_grp = paste0("v", row_number(), "v", row_number() + 2)) 
    
    # converting data to tidy format and pairing columns to be multiplied together.
    tidy_df <- 
        df %>%
        gather(column, value, -prod_grp) %>% 
        mutate(column = as.numeric(sub("v", "", column)),
               pair = column - 2) %>% 
        mutate(pair = if_else(pair < 1, pair + 2, pair))
    
    # summarize the products for each column
    prod_df <- 
        tidy_df %>% 
        group_by(prod_grp, pair) %>% 
        summarize(val = prod(value)) %>% 
        spread(prod_grp, val) %>% 
        mutate(pair = paste0("v", pair, "v", pair + 2)) %>% 
        rename(prod_grp = pair)
    
    # put the original frame and summary frames together
    final_df <- 
        df %>% 
        left_join(prod_df) %>% 
        select(-prod_grp)
    
    0 讨论(0)
  • 2020-12-09 05:57

    I think I found a solution:

    df %>%
      mutate(n = df[1:(ncol(df)/2)] * df[(1+ncol(df)/2):(ncol(df))]) %>% head()
    

    The result is valid for any number of variables. It only remains a problem with the name of the new variables. This is the result:

      v1 v2 v3 v4 n.v1 n.v2
    1  1  5  7  4    7   20
    2  2  6 10  3   20   18
    
    0 讨论(0)
  • 2020-12-09 05:58

    We can use base R instead of using any extra packages like dplyr or data.table

    We can use mapply to vectorize the operation for multiple vectors at the same time

    n <- ncol(df)/2
    mapply(`*`, df[1:n], df[(n + 1):ncol(df)])
    
    #     v1 v2
    #[1,]  7 20
    #[2,] 20 18
    

    We can merge (cbind) this dataframe to your original one then.


    If you are interested in tidyverse solution the equivalent in purrr would be variants of map2

    purrr::map2_df(df[1:n], df[(n + 1):ncol(df)], `*`)
    
    # A tibble: 2 x 2
    #     v1    v2
    #  <dbl> <dbl>
    #1     7    20
    #2    20    18
    
    0 讨论(0)
  • 2020-12-09 06:00

    Just use mutate as is with a comma to separate new columns mutate(df,"v1v3"=v1*v3,"v2v4"= v2*v4)

    0 讨论(0)
提交回复
热议问题