bunch recoding of variables in the tidyverse (functional / meta-programing)

会有一股神秘感。 提交于 2019-12-04 19:24:54

Thanks for a cool question! This is a great chance to use all the power of metaprogramming.

First, let's examine the recode() function. It gets a vector and an arbitrary number of (named) arguments and returns the same vector with values replaced with function arguments:

x <- c("a", "b", "c")
recode(x, a = "Z", c = "X")

#> [1] "Z" "b" "X"

recode's help says that we can use unquote splicing (!!!) to pass a named list into it.

x_codes <- list(a = "Z", c = "X")
recode(x, !!!x_codes)

#> [1] "Z" "b" "X"

This ability may be used when mutating a data frame. Suggesting, we have a subset of Rossi dataset:

library(carData)
library(tidyverse)

rossi <- Rossi %>% 
  as_tibble() %>% 
  select(mar, wexp)

To mutate two variables in a single function call we can use this snippet (note that both named arguments and unquote splicing approaches work well):

mar_codes <- list(`not married` = 0, married = 1)
wexp_codes <- list(no = 0, yes = 1)

rossi %>% 
  mutate(
    mar_code = recode(mar, "not married" = 0, "married" = 1),
    wexp_code = recode(wexp, !!!wexp_codes)
  )

#> # A tibble: 432 x 4
#>    mar         wexp  mar_code wexp_code
#>    <fct>       <fct>    <dbl>     <dbl>
#>  1 not married no           0         0
#>  2 not married no           0         0
#>  3 not married yes          0         1
#>  4 married     yes          1         1
#>  5 not married yes          0         1

So, unquote splicing is a good method to pass multiple arguments into a function in a non-standard evaluation environment.

Now suggest we have a list of lists of codes:

mapping <- list(mar = mar_codes, wexp = wexp_codes)
mapping

#> $mar
#> $mar$`not married`
#> [1] 0

#> $mar$married
#> [1] 1

#> $wexp
#> $wexp$no
#> [1] 0

#> $wexp$yes
#> [1] 1

What we need is to transform this list to list of expressions to place inside mutate():

expressions <- mapping %>% 
  imap(
    ~ quo(
      recode(!!sym(.y), !!!.x)
    )
  )

expressions

#> $mar
#> <quosure>
#> expr: ^recode(mar, not married = 0, married = 1)
#> env:  0x7fbf374513c0

#> $wexp
#> <quosure>
#> expr: ^recode(wexp, no = 0, yes = 1)
#> env:  0x7fbf37453468

The last step. Pass this list of expressions inside the mutate and see what it will do:

mutate(rossi, !!!expressions)

#> # A tibble: 432 x 2
#>      mar  wexp
#>    <dbl> <dbl>
#>  1     0     0
#>  2     0     0
#>  3     0     1
#>  4     1     1
#>  5     0     1

Now you can widen your lists of variables to recode, handle several lists at once and so on.

With such a powerful technique (metaprogramming) you can do amazing things. I strongly recommend you delve into this theme. And there is no better resource to start than Hadley Wickham's Advanced R book.

Hope, it's what you have been looking for.

Update

Diving deeper. The question was: how to apply this technique to a tibble-column?

Let's create nested tibble of group and df (our data to recode)

rossi <- 
  head(Rossi, 5) %>% 
  as_tibble() %>% 
  select(mar, wexp)

nested <- tibble(group = c("yes", "no"), df = list(rossi))

nested looks like:

# A tibble: 2 x 2
  group df              
  <chr> <list>          
1 yes   <tibble [5 × 2]>
2 no    <tibble [5 × 2]>

We already know how to build a list of expressions from the list of codes. Let's create a function to handle it for us.

build_recode_expressions <- function(list_of_codes) {
  imap(list_of_codes, ~ quo(recode(!!sym(.y), !!!.x)))
}

There, list_of_codes argument is a named list for each variable needed to recode.

Assuming, we have a list of multiple recodings codes, we can transform it into the list of multiple lists of expressions. The number of variables in each list may be arbitrary.

codes <- list(
  yes = list(mar = list(`not married` = 0, married = 1)),
  no = list(
    mar = list(`not married` = 10, married = 20), 
    wexp = list(no = "NOOOO", yes = "YEEEES")
  )
)

exprs <- map(codes, build_recode_expressions)

Now we can easily add exprs into the nested data frame as new list-column.

There is another function may be useful for further work. This function takes a data frame and a list of quoted expressions and returns a new data frame with recoded columns.

recode_df <- function(df, exprs) mutate(df, !!!exprs)

It's time to combine all together. We have tibble-column df, list-column exprs and function recode_df that binds them together but one by one.

The clue is map2 function. It allows us to iterate over two lists simultaneously:

nested %>% 
  mutate(exprs = exprs) %>% 
  mutate(df_recoded = map2(df, exprs, recode_df)) %>% 
  unnest(df, df_recoded)

And this is the output:

# A tibble: 10 x 5
   group mar         wexp   mar1 wexp1 
   <chr> <fct>       <fct> <dbl> <chr> 
 1 yes   not married no        0 no    
 2 yes   not married no        0 no    
 3 yes   not married yes       0 yes   
 4 yes   married     yes       1 yes   
 5 yes   not married yes       0 yes   
 6 no    not married no       10 NOOOO 
 7 no    not married no       10 NOOOO 
 8 no    not married yes      10 YEEEES
 9 no    married     yes      20 YEEEES
10 no    not married yes      10 YEEEES

I hope this update will solve your problem.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!