Stepping through a pipeline with intermediate results

前端未结

关注

 5  1572

Is there a way to output the result of a pipeline at each step without doing it manually? (eg. without selecting and running only the selected chunks)

I ofte

相关标签:

5条回答

抹茶落季

2021-01-01 20:39

It is easy with magrittr function chain. For example define a function my_chain with:

foo <- function(x) x + 1 bar <- function(x) x + 1 baz <- function(x) x + 1 my_chain <- . %>% foo %>% bar %>% baz

and get the final result of a chain as:

> my_chain(0) [1] 3

You can get a function list with functions(my_chain) and define a "stepper" function like this:

stepper <- function(fun_chain, x, FUN = print) { f_list <- functions(fun_chain) for(i in seq_along(f_list)) { x <- f_list[[i]](x) FUN(x) } invisible(x) }

And run the chain with interposed print function:

stepper(my_chain, 0, print) # [1] 1 # [1] 2 # [1] 3

Or with waiting for user input:

stepper(my_chain, 0, function(x) {print(x); readline()})

0 讨论(0)

发布评论:

提交评论

加载中...

再見小時候

2021-01-01 20:39

I wrote the package pipes that can do several things that might help :

use %P>% to print the output.

use %ae>% to use all.equal on input and output.

use %V>% to use View on the output, it will open a viewer for each relevant step.

If you want to see some aggregated info you can try %summary>%, %glimpse>% or %skim>% which will use summary, tibble::glimpse or skimr::skim, or you can define your own pipe to show specific changes, using new_pipe

# devtools::install_github("moodymudskipper/pipes") library(dplyr) library(pipes)

res <- mtcars %P>% group_by(cyl) %P>% sample_frac(0.1) %P>% summarise(res = mean(mpg)) #> group_by(., cyl) #> # A tibble: 32 x 11 #> # Groups: cyl [3] #> mpg cyl disp hp drat wt qsec vs am gear carb #> * <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 #> 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 #> 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 #> 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 #> 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 #> 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1 #> 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4 #> 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2 #> 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2 #> 10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4 #> # ... with 22 more rows #> sample_frac(., 0.1) #> # A tibble: 3 x 11 #> # Groups: cyl [3] #> mpg cyl disp hp drat wt qsec vs am gear carb #> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 26 4 120. 91 4.43 2.14 16.7 0 1 5 2 #> 2 17.8 6 168. 123 3.92 3.44 18.9 1 0 4 4 #> 3 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 #> summarise(., res = mean(mpg)) #> # A tibble: 3 x 2 #> cyl res #> <dbl> <dbl> #> 1 4 26 #> 2 6 17.8 #> 3 8 18.7

res <- mtcars %ae>% group_by(cyl) %ae>% sample_frac(0.1) %ae>% summarise(res = mean(mpg)) #> group_by(., cyl) #> [1] "Attributes: < Names: 1 string mismatch >" #> [2] "Attributes: < Length mismatch: comparison on first 2 components >" #> [3] "Attributes: < Component \"class\": Lengths (1, 4) differ (string compare on first 1) >" #> [4] "Attributes: < Component \"class\": 1 string mismatch >" #> [5] "Attributes: < Component 2: Modes: character, list >" #> [6] "Attributes: < Component 2: Lengths: 32, 2 >" #> [7] "Attributes: < Component 2: names for current but not for target >" #> [8] "Attributes: < Component 2: Attributes: < target is NULL, current is list > >" #> [9] "Attributes: < Component 2: target is character, current is tbl_df >" #> sample_frac(., 0.1) #> [1] "Different number of rows" #> summarise(., res = mean(mpg)) #> [1] "Cols in y but not x: `res`. " #> [2] "Cols in x but not y: `qsec`, `wt`, `drat`, `hp`, `disp`, `mpg`, `carb`, `gear`, `am`, `vs`. "

res <- mtcars %V>% group_by(cyl) %V>% sample_frac(0.1) %V>% summarise(res = mean(mpg)) # you'll have to test this one by yourself

0 讨论(0)

发布评论:

提交评论

加载中...

后悔当初

2021-01-01 20:41

IMHO magrittr is mostly useful interactively, that is when I am exploring data or building a new formula/model.

In this cases, storing intermediate results in distinct variables is very time consuming and distracting, while pipes let me focus on data, rather than typing:

x %>% foo ## reason on results and x %>% foo %>% bar ## reason on results and x %>% foo %>% bar %>% baz ## etc.

The problem here is that I don't know in advance what the final pipe will be, like in @bergant.

Typing, as in @zx8754,

x %>% print %>% foo %>% print %>% bar %>% print %>% baz

adds to much overhead and, to me, defeats the whole purpose of magrittr.

Essentially magrittr lacks a simple operator that both prints and pipes results.
The good news is that it seems quite easy to craft one:

`%P>%`=function(lhs, rhs){ print(lhs); lhs %>% rhs }

Now you can print an pipe:

1:4 %P>% sqrt %P>% sum ## [1] 1 2 3 4 ## [1] 1.000000 1.414214 1.732051 2.000000 ## [1] 6.146264

I found that if one defines/uses a key bindings for %P>% and %>%, the prototyping workflow is very streamlined (see Emacs ESS or RStudio).

0 讨论(0)

发布评论:

提交评论

加载中...

闹比i

2021-01-01 20:42

You can select which results to print by using the tee-operator (%T>%) and print(). The tee-operator is used exclusively for side-effects like printing.

# i.e. mtcars %>% group_by(cyl) %T>% print() %>% sample_frac(0.1) %T>% print() %>% summarise(res = mean(mpg))

0 讨论(0)

发布评论:

提交评论

加载中...

挽巷

2021-01-01 20:46

Add print:

mtcars %>% group_by(cyl) %>% print %>% sample_frac(0.1) %>% print %>% summarise(res = mean(mpg))

0 讨论(0)

发布评论:

提交评论

加载中...

验证码

看不清?

提交回复