问题
I have data like this:
cols <- c("X01_01","X01_01_p", "X01_02","X01_02_p", "X01_03","X01_03_p", "X01_04", "X01_05","X01_06")
set.seed(111)
values <- replicate(9, sample(1:5, 4, replace = TRUE))
df <- as.data.frame(values)
So my df looks like this:
X01_01 X01_01_p X01_02 X01_02_p X01_03 X01_03_p X01_04 X01_05 X01_06
1 3 2 3 1 1 3 5 4 3
2 4 3 1 1 5 2 2 3 3
3 2 1 3 1 2 2 4 1 2
4 3 3 3 3 4 2 2 3 4
I have some columns to use for mutation (not all) and the names of the new columns.
cols_to_mutate <- c("X01_01_p","X01_02_p", "X01_03_p", "X01_04", "X01_05","X01_06")
new_cols <- c("X01_01_n","X01_02_n", "X01_03_n", "X01_04_n", "X01_05_n","X01_06_n")
Each mutation is the same:
- If the value is 1 or 2, the new value has to be 0
- If the value is 3, the new value has to be 0.5
- If the value is 4 or 5, the new value has to be 1
Ultimately my df looks like this:
X01_01 X01_01_p X01_02 X01_02_p X01_03 X01_03_p X01_04 X01_05 X01_06 X01_01_n X01_02_n X01_03_n X01_04_n X01_05_n X01_06_n
1 3 2 3 1 1 3 5 4 3 0.0 0.0 0.5 1 1.0 0.5
2 4 3 1 1 5 2 2 3 3 0.5 0.0 0.0 0 0.5 0.5
3 2 1 3 1 2 2 4 1 2 0.0 0.0 0.0 1 0.0 0.0
4 3 3 3 3 4 2 2 3 4 0.5 0.5 0.0 0 0.5 1.0
In 'hard coding' I could write lots of lines like this:
df <- mutate(df, X01_01_n = ifelse(X01_01_p <= 2, 0, (ifelse(X01_01_p == 3, 0.5, 1))))
df <- mutate(df, X01_02_n = ifelse(X01_02_p <= 2, 0, (ifelse(X01_02_p == 3, 0.5, 1))))
But of course I am searching for a more fancy and quicker way to do this, but I searched and searched, but dit not find the solution. I tried:
df <- cbind(df,apply(df[,cols_to_mutate],2, function(x) if (x < 3) { 0} else if (x > 3) {1} else {.5}))
But this does not work. Any ideas would be great!!
回答1:
If it isn't crucial that you keep the previous columns and instead mutate in place, you can use mutate_at
and a case_when
inside the function used to mutate.
case_when
is making use of the between
function from dplyr
to setup conditions, then assigns a value with ~
. The last argument, T ~ NA_real_
, assigns NA
to any observations that didn't match any of the conditions.
library(tidyverse)
cols_to_mutate <- c("X01_01_p","X01_02_p", "X01_03_p", "X01_04", "X01_05","X01_06")
df %>%
mutate_at(cols_to_mutate, function(x) {
case_when(
between(x, 1, 2) ~ 0,
x == 3 ~ 0.5,
between(x, 4, 5) ~ 1,
T ~ NA_real_
)
})
#> X01_01 X01_01_p X01_02 X01_02_p X01_03 X01_03_p X01_04 X01_05 X01_06
#> 1 3 0.0 3 0.0 1 0.5 1 1.0 0.5
#> 2 4 0.5 1 0.0 5 0.0 0 0.5 0.5
#> 3 2 0.0 3 0.0 2 0.0 1 0.0 0.0
#> 4 3 0.5 3 0.5 4 0.0 0 0.5 1.0
If it is necessary to keep the original columns and give new names to the rescaled columns, here is some rlang
+ purrr
trickiness. What I did is imap
ed over the columns of the data frame. If the name was in the list of columns to mutate, I used the same case_when
as above, and output a tibble
with two columns: one is the original column, with its name assigned using quo_name
and the :=
operator, and the other is the new values column, with the same name but _n
appended. If it isn't a column to mutate, it just returns a tibble
of the original column. By using imap_dfc
, all the columns are bound back together into one data frame.
df %>%
imap_dfc(function(x, name) {
if (name %in% cols_to_mutate) {
new_vals <- case_when(
between(x, 1, 2) ~ 0,
x == 3 ~ 0.5,
between(x, 4, 5) ~ 1,
T ~ NA_real_
)
tibble(!!quo_name(name) := x, !!quo_name(paste0(name, "_n")) := new_vals)
} else {
tibble(!!quo_name(name) := x)
}
})
#> # A tibble: 4 x 15
#> X01_01 X01_01_p X01_01_p_n X01_02 X01_02_p X01_02_p_n X01_03 X01_03_p
#> <int> <int> <dbl> <int> <int> <dbl> <int> <int>
#> 1 3 2 0 3 1 0 1 3
#> 2 4 3 0.5 1 1 0 5 2
#> 3 2 1 0 3 1 0 2 2
#> 4 3 3 0.5 3 3 0.5 4 2
#> # ... with 7 more variables: X01_03_p_n <dbl>, X01_04 <int>,
#> # X01_04_n <dbl>, X01_05 <int>, X01_05_n <dbl>, X01_06 <int>,
#> # X01_06_n <dbl>
回答2:
You could do something like this, which assumes your numbers only take the values 1 through 5.
map_marlein <- function(x) {
if (any(!x %in% 1:5)) {
stop("Needs numbers from 1-5")
}
as.integer(cut(x, c(0,2,3, 10))) / 2 - 0.5
}
df[, paste0(names(df), "_n")] <- lapply(df[, names(df)], map_marlein)
df
X01_01 X01_01_p X01_02 X01_02_p X01_03 X01_03_p X01_04 X01_05 X01_06 X01_01_n X01_01_p_n X01_02_n X01_02_p_n X01_03_n X01_03_p_n X01_04_n X01_05_n X01_06_n
1 3 2 3 1 1 3 5 4 3 0.5 0.0 0.5 0.0 0 0.5 1 1.0 0.5
2 4 3 1 1 5 2 2 3 3 1.0 0.5 0.0 0.0 1 0.0 0 0.5 0.5
3 2 1 3 1 2 2 4 1 2 0.0 0.0 0.5 0.0 0 0.0 1 0.0 0.0
4 3 3 3 3 4 2 2 3 4 0.5 0.5 0.5 0.5 1 0.0 0 0.5 1.0
来源:https://stackoverflow.com/questions/50914681/r-mutate-multiple-columns-with-if-statement