R mutate multiple columns with if statement

二次信任 提交于 2021-02-05 09:11:57

问题


I have data like this:

cols <- c("X01_01","X01_01_p", "X01_02","X01_02_p", "X01_03","X01_03_p", "X01_04", "X01_05","X01_06")
set.seed(111)
values <- replicate(9, sample(1:5, 4, replace = TRUE)) 
df <- as.data.frame(values)  

So my df looks like this:

    X01_01 X01_01_p X01_02 X01_02_p X01_03 X01_03_p X01_04 X01_05 X01_06
1      3        2      3        1      1        3      5      4      3
2      4        3      1        1      5        2      2      3      3
3      2        1      3        1      2        2      4      1      2
4      3        3      3        3      4        2      2      3      4

I have some columns to use for mutation (not all) and the names of the new columns.

cols_to_mutate <- c("X01_01_p","X01_02_p", "X01_03_p", "X01_04", "X01_05","X01_06")
new_cols <- c("X01_01_n","X01_02_n", "X01_03_n", "X01_04_n", "X01_05_n","X01_06_n")

Each mutation is the same:

  • If the value is 1 or 2, the new value has to be 0
  • If the value is 3, the new value has to be 0.5
  • If the value is 4 or 5, the new value has to be 1

Ultimately my df looks like this:

    X01_01 X01_01_p X01_02 X01_02_p X01_03 X01_03_p X01_04 X01_05 X01_06 X01_01_n X01_02_n X01_03_n X01_04_n X01_05_n X01_06_n
1      3        2      3        1      1        3      5      4      3      0.0      0.0      0.5        1      1.0      0.5
2      4        3      1        1      5        2      2      3      3      0.5      0.0      0.0        0      0.5      0.5
3      2        1      3        1      2        2      4      1      2      0.0      0.0      0.0        1      0.0      0.0
4      3        3      3        3      4        2      2      3      4      0.5      0.5      0.0        0      0.5      1.0

In 'hard coding' I could write lots of lines like this:

df <- mutate(df, X01_01_n = ifelse(X01_01_p <= 2, 0, (ifelse(X01_01_p == 3, 0.5, 1))))
df <- mutate(df, X01_02_n = ifelse(X01_02_p <= 2, 0, (ifelse(X01_02_p == 3, 0.5, 1))))

But of course I am searching for a more fancy and quicker way to do this, but I searched and searched, but dit not find the solution. I tried:

df <- cbind(df,apply(df[,cols_to_mutate],2, function(x) if (x < 3) { 0} else if (x > 3) {1} else {.5}))

But this does not work. Any ideas would be great!!


回答1:


If it isn't crucial that you keep the previous columns and instead mutate in place, you can use mutate_at and a case_when inside the function used to mutate.

case_when is making use of the between function from dplyr to setup conditions, then assigns a value with ~. The last argument, T ~ NA_real_, assigns NA to any observations that didn't match any of the conditions.

library(tidyverse)

cols_to_mutate <- c("X01_01_p","X01_02_p", "X01_03_p", "X01_04", "X01_05","X01_06")

df %>%
  mutate_at(cols_to_mutate, function(x) {
    case_when(
      between(x, 1, 2) ~ 0,
      x == 3 ~ 0.5,
      between(x, 4, 5) ~ 1,
      T ~ NA_real_
    )
  })
#>   X01_01 X01_01_p X01_02 X01_02_p X01_03 X01_03_p X01_04 X01_05 X01_06
#> 1      3      0.0      3      0.0      1      0.5      1    1.0    0.5
#> 2      4      0.5      1      0.0      5      0.0      0    0.5    0.5
#> 3      2      0.0      3      0.0      2      0.0      1    0.0    0.0
#> 4      3      0.5      3      0.5      4      0.0      0    0.5    1.0

If it is necessary to keep the original columns and give new names to the rescaled columns, here is some rlang + purrr trickiness. What I did is imaped over the columns of the data frame. If the name was in the list of columns to mutate, I used the same case_when as above, and output a tibble with two columns: one is the original column, with its name assigned using quo_name and the := operator, and the other is the new values column, with the same name but _n appended. If it isn't a column to mutate, it just returns a tibble of the original column. By using imap_dfc, all the columns are bound back together into one data frame.

df %>%
  imap_dfc(function(x, name) {
    if (name %in% cols_to_mutate) {
      new_vals <- case_when(
        between(x, 1, 2) ~ 0,
        x == 3 ~ 0.5,
        between(x, 4, 5) ~ 1,
        T ~ NA_real_
      )
      tibble(!!quo_name(name) := x, !!quo_name(paste0(name, "_n")) := new_vals)
    } else {
      tibble(!!quo_name(name) := x)
    }
  })
#> # A tibble: 4 x 15
#>   X01_01 X01_01_p X01_01_p_n X01_02 X01_02_p X01_02_p_n X01_03 X01_03_p
#>    <int>    <int>      <dbl>  <int>    <int>      <dbl>  <int>    <int>
#> 1      3        2        0        3        1        0        1        3
#> 2      4        3        0.5      1        1        0        5        2
#> 3      2        1        0        3        1        0        2        2
#> 4      3        3        0.5      3        3        0.5      4        2
#> # ... with 7 more variables: X01_03_p_n <dbl>, X01_04 <int>,
#> #   X01_04_n <dbl>, X01_05 <int>, X01_05_n <dbl>, X01_06 <int>,
#> #   X01_06_n <dbl>



回答2:


You could do something like this, which assumes your numbers only take the values 1 through 5.

map_marlein <- function(x) {
  if (any(!x %in% 1:5)) {
    stop("Needs numbers from 1-5")
  }
  as.integer(cut(x, c(0,2,3, 10))) / 2 - 0.5
}

df[, paste0(names(df), "_n")] <- lapply(df[, names(df)], map_marlein)
df
  X01_01 X01_01_p X01_02 X01_02_p X01_03 X01_03_p X01_04 X01_05 X01_06 X01_01_n X01_01_p_n X01_02_n X01_02_p_n X01_03_n X01_03_p_n X01_04_n X01_05_n X01_06_n
1      3        2      3        1      1        3      5      4      3      0.5        0.0      0.5        0.0        0        0.5        1      1.0      0.5
2      4        3      1        1      5        2      2      3      3      1.0        0.5      0.0        0.0        1        0.0        0      0.5      0.5
3      2        1      3        1      2        2      4      1      2      0.0        0.0      0.5        0.0        0        0.0        1      0.0      0.0
4      3        3      3        3      4        2      2      3      4      0.5        0.5      0.5        0.5        1        0.0        0      0.5      1.0


来源:https://stackoverflow.com/questions/50914681/r-mutate-multiple-columns-with-if-statement

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!