I want to recode a bunch of variables with as few function calls as possible. I have one data.frame where I want to recode a number of variables. I create a named list of all variable names and the recoding arguments I want to execute. Here I have no problem using map
and dpylr
. However, when it comes to recoding I find it much easier using recode
from the car
package, instead of dpylr
's own recoding function. A side question is whether there is a nice way of doing the same thing with dplyr::recode
.
As a next step I break the data.frame down into a nested tibble. Here I want to do specific recodings in each subset. This is where things get complicated and I am not able to do this in a dpylr
pipe anymore. The only thing I get working is a very ugly nested for loop
.
Looking for ideas to do this in a nice and clean way.
Lets start with the easy example:
library(carData)
library(dplyr)
library(purrr)
library(tidyr)
# global recode list
recode_ls = list(
mar = "'not married' = 0;
'married' = 1",
wexp = "'no' = 0;
'yes' = 1"
)
recode_vars <- names(Rossi)[names(Rossi) %in% names(recode_ls)]
Rossi2 <- Rossi # lets save results under a different name
Rossi2[,recode_vars] <- recode_vars %>% map(~ car::recode(Rossi[[.x]],
recode_ls[.x],
as.factor = FALSE,
as.numeric = TRUE))
So far this seems pretty clean to me, apart from the fact that car::recode is much easier to use than dplyr::recode.
Here comes my actual problem. What I am trying to do is recode (in this easy example) the variables mar
and wexp
differently in each tibble subset. In my real data set the variables I want to recode in each subset are many more and have different names too. Does anyone have a good idea how to do this nice and clean using a dpylr
pipe and map
?
nested_rossi <- as_tibble(Rossi) %>% nest(-race)
recode_wexp_ls = list(
no = list(
mar = "'not married' = 0;
'married' = 1",
wexp = "'no' = 0;
'yes' = 1"
),
yes = list(
mar = "'not married' = 1;
'married' = 2",
wexp = "'no' = 1;
'yes' = 2"
)
We could also attach the list to the nested data.frame, but I'm not sure if this would make things more efficient.
nested_rossi$recode = list(
no = list(
mar = "'not married' = 0;
'married' = 1",
wexp = "'no' = 0;
'yes' = 1"
),
yes = list(
mar = "'not married' = 1;
'married' = 2",
wexp = "'no' = 1;
'yes' = 2"
)
)
Thanks for a cool question! This is a great chance to use all the power of metaprogramming.
First, let's examine the recode()
function. It gets a vector and an arbitrary number of (named) arguments and returns the same vector with values replaced with function arguments:
x <- c("a", "b", "c")
recode(x, a = "Z", c = "X")
#> [1] "Z" "b" "X"
recode
's help says that we can use unquote splicing (!!!
) to pass a named list into it.
x_codes <- list(a = "Z", c = "X")
recode(x, !!!x_codes)
#> [1] "Z" "b" "X"
This ability may be used when mutating a data frame. Suggesting, we have a subset of Rossi dataset:
library(carData)
library(tidyverse)
rossi <- Rossi %>%
as_tibble() %>%
select(mar, wexp)
To mutate two variables in a single function call we can use this snippet (note that both named arguments and unquote splicing approaches work well):
mar_codes <- list(`not married` = 0, married = 1)
wexp_codes <- list(no = 0, yes = 1)
rossi %>%
mutate(
mar_code = recode(mar, "not married" = 0, "married" = 1),
wexp_code = recode(wexp, !!!wexp_codes)
)
#> # A tibble: 432 x 4
#> mar wexp mar_code wexp_code
#> <fct> <fct> <dbl> <dbl>
#> 1 not married no 0 0
#> 2 not married no 0 0
#> 3 not married yes 0 1
#> 4 married yes 1 1
#> 5 not married yes 0 1
So, unquote splicing is a good method to pass multiple arguments into a function in a non-standard evaluation environment.
Now suggest we have a list of lists of codes:
mapping <- list(mar = mar_codes, wexp = wexp_codes)
mapping
#> $mar
#> $mar$`not married`
#> [1] 0
#> $mar$married
#> [1] 1
#> $wexp
#> $wexp$no
#> [1] 0
#> $wexp$yes
#> [1] 1
What we need is to transform this list to list of expressions to place inside mutate()
:
expressions <- mapping %>%
imap(
~ quo(
recode(!!sym(.y), !!!.x)
)
)
expressions
#> $mar
#> <quosure>
#> expr: ^recode(mar, not married = 0, married = 1)
#> env: 0x7fbf374513c0
#> $wexp
#> <quosure>
#> expr: ^recode(wexp, no = 0, yes = 1)
#> env: 0x7fbf37453468
The last step. Pass this list of expressions inside the mutate and see what it will do:
mutate(rossi, !!!expressions)
#> # A tibble: 432 x 2
#> mar wexp
#> <dbl> <dbl>
#> 1 0 0
#> 2 0 0
#> 3 0 1
#> 4 1 1
#> 5 0 1
Now you can widen your lists of variables to recode, handle several lists at once and so on.
With such a powerful technique (metaprogramming) you can do amazing things. I strongly recommend you delve into this theme. And there is no better resource to start than Hadley Wickham's Advanced R book.
Hope, it's what you have been looking for.
Update
Diving deeper. The question was: how to apply this technique to a tibble-column?
Let's create nested tibble of group
and df
(our data to recode)
rossi <-
head(Rossi, 5) %>%
as_tibble() %>%
select(mar, wexp)
nested <- tibble(group = c("yes", "no"), df = list(rossi))
nested
looks like:
# A tibble: 2 x 2
group df
<chr> <list>
1 yes <tibble [5 × 2]>
2 no <tibble [5 × 2]>
We already know how to build a list of expressions from the list of codes. Let's create a function to handle it for us.
build_recode_expressions <- function(list_of_codes) {
imap(list_of_codes, ~ quo(recode(!!sym(.y), !!!.x)))
}
There, list_of_codes
argument is a named list for each variable needed to recode.
Assuming, we have a list of multiple recodings codes
, we can transform it into the list of multiple lists of expressions. The number of variables in each list may be arbitrary.
codes <- list(
yes = list(mar = list(`not married` = 0, married = 1)),
no = list(
mar = list(`not married` = 10, married = 20),
wexp = list(no = "NOOOO", yes = "YEEEES")
)
)
exprs <- map(codes, build_recode_expressions)
Now we can easily add exprs
into the nested data frame as new list-column.
There is another function may be useful for further work. This function takes a data frame and a list of quoted expressions and returns a new data frame with recoded columns.
recode_df <- function(df, exprs) mutate(df, !!!exprs)
It's time to combine all together.
We have tibble-column df
, list-column exprs
and function recode_df
that binds them together but one by one.
The clue is map2
function. It allows us to iterate over two lists simultaneously:
nested %>%
mutate(exprs = exprs) %>%
mutate(df_recoded = map2(df, exprs, recode_df)) %>%
unnest(df, df_recoded)
And this is the output:
# A tibble: 10 x 5
group mar wexp mar1 wexp1
<chr> <fct> <fct> <dbl> <chr>
1 yes not married no 0 no
2 yes not married no 0 no
3 yes not married yes 0 yes
4 yes married yes 1 yes
5 yes not married yes 0 yes
6 no not married no 10 NOOOO
7 no not married no 10 NOOOO
8 no not married yes 10 YEEEES
9 no married yes 20 YEEEES
10 no not married yes 10 YEEEES
I hope this update will solve your problem.
来源:https://stackoverflow.com/questions/56636417/bunch-recoding-of-variables-in-the-tidyverse-functional-meta-programing