Take this simple dataset and function (representative of more complex problems):
x <- data.frame(a = 1:3, b = 2:4)
mult <- function(a,b,n) (a + b) * n
To mimic the input format for Map
, we could call pmap
from purrr
in this way:
x[paste0("new",seq_along(ns))] <- pmap(list(x['a'], x['b'], ns), mult)
To fit this in a pipe:
x %>%
{list(.['a'], .['b'], ns)} %>%
pmap(mult) %>%
setNames(paste0('new', seq_along(ns))) %>%
cbind(x)
# new1 new2 a b
# 1 3 6 1 2
# 2 5 10 2 3
# 3 7 14 3 4
Apparently, this looks ugly compared to the concise base R code. But I could not think of a better way.
Here is one possibility.
library(purrr)
library(dplyr)
n <- 1:2
x %>%
mutate(val = pmap(., mult, n = n)) %>%
unnest() %>%
mutate(var = rep(paste0("new", n), nrow(.) / length(n))) %>%
spread(var, val)
# a b new1 new2
#1 1 2 3 6
#2 2 3 5 10
#3 3 4 7 14
Not pretty, so I'm also curious to see alternatives. A lot of excess comes about from unnest
ing the list
column and spread
ing into new columns.
Here is another possibility using pmap_dfc
plus an ugly as.data.frame(t(...))
call
bind_cols(x, as.data.frame(t(pmap_dfc(x, mult, n = n))))
# a b V1 V2
#1 1 2 3 6
#2 2 3 5 10
#3 3 4 7 14
x <- data.frame(a = 1:3, b = 2:4)
mult <- function(a,b,n) (a + b) * n
The best approach I've found (which is still not terribly elegant) is to pipe into bind_cols
. To get pmap_dfr
to work correctly, the function should return a named list (which may or may not be a data frame):
library(tidyverse)
x <- data.frame(a = 1:3, b = 2:4)
mult <- function(a,b,n) as.list(set_names((a + b) * n, paste0('new', n)))
x %>% bind_cols(pmap_dfr(., mult, n = 1:2))
#> a b new1 new2
#> 1 1 2 3 6
#> 2 2 3 5 10
#> 3 3 4 7 14
To avoid changing the definition of mult
, you can wrap it in an anonymous function:
mult <- function(a,b,n) (a + b) * n
x %>% bind_cols(pmap_dfr(
.,
~as.list(set_names(
mult(...),
paste0('new', 1:2)
)),
n = 1:2
))
#> a b new1 new2
#> 1 1 2 3 6
#> 2 2 3 5 10
#> 3 3 4 7 14
In this particular case, it's not actually necessary to iterate over rows, though, because you can vectorize the inputs from x
and instead iterate over n
. The advantage is that usually n > p, so the number of iterations will be [potentially much] lower. To be clear, whether such an approach is possible depends on for which parameters the function can accept vector arguments.
mult
still needs to be called on the variables of x
. The simplest way to do this is to pass them explicitly:
x %>% bind_cols(map_dfc(1:2, ~mult(x$a, x$b, .x)))
#> a b V1 V2
#> 1 1 2 3 6
#> 2 2 3 5 10
#> 3 3 4 7 14
...but this loses the benefit of pmap
that named variables will automatically get passed to the correct parameter. You can get that back by using purrr::lift
, which is an adverb that changes the domain of a function so it accepts a list by wrapping it in do.call
. The returned function can be called on x
and the value of n
for that iteration:
x %>% bind_cols(map_dfc(1:2, ~lift(mult)(x, n = .x)))
This is equivalent to
x %>% bind_cols(map_dfc(1:2, ~invoke(mult, x, n = .x)))
but the advantage of the former is that it returns a function which can be partial
ly applied on x
so it only has an n
parameter left, and thus requires no explicit references to x
and so pipes better:
x %>% bind_cols(map_dfc(1:2, partial(lift(mult), .)))
All return the same thing. Names can be fixed after the fact with %>% set_names(~sub('^V(\\d+)$', 'new\\1', .x))
, if you like.