Print data frame dimensions at each step of filtering

后端 未结 3 473
日久生厌
日久生厌 2021-01-14 19:32

I am using the tidyverse to filter out a dataframe and would like a print at each step of the dimensions (or nrows) of the intermediate objects. I thought I could simply use

相关标签:
3条回答
  • 2021-01-14 19:46

    The pipe %T>% from library magrittr was created just for this type of cases :

    library(magrittr)
    library(dplyr)
    mtcars %>%
      filter(cyl > 4)     %T>% {print(dim(.))} %>%
      filter(am == 0)     %T>% {print(dim(.))} %>%
      filter(disp >= 200) %T>% {print(dim(.))}
    

    Very easy to read and edit out in Rstudio using alt + selection if you ident as I do.

    You can also use @hrbrmstr 's function here if you don't like brackets, except you won't need the last line.


    Revisiting it months later here's an idea generalizing @hrbmst's solution so you can print pretty much what you want and return the input to carry on with the pipe.

    library(tidyverse)
    pprint <- function(.data,.fun,...){
      .fun <- purrr::as_mapper(.fun)
      print(.fun(.data,...))
      invisible(.data)
    }
    
    iris %>%
      pprint(~"hello")           %>%
      head(2)                    %>%
      select(-Species)           %>%
      pprint(rowSums,na.rm=TRUE) %>%
      pprint(~rename_all(.[1:2],toupper)) %>%
      pprint(dim)
    
    # [1] "hello"
    #    1    2 
    # 10.2  9.5 
    #   SEPAL.LENGTH SEPAL.WIDTH
    # 1          5.1         3.5
    # 2          4.9         3.0
    # [1] 2 4
    
    0 讨论(0)
  • 2021-01-14 19:54

    @akrun's idea works, but it's not idiomatic tidyverse. Other functions in the tidyverse, like print() and glimpse() return the data parameter invisibly so they can be piped without resorting to {}. Those {} make it difficult to clean up pipes after your done exploring what's going on.

    Try:

    library(tidyverse)
    
    tidydim <- function(x) {
      print(dim(x))
      invisible(x)
    }
    
    mtcars %>%
      filter(cyl > 4) %>%
      tidydim() %>% 
      filter(., am == 0) %>%
      tidydim() %>% 
      filter(., disp >= 200) %>%
      tidydim()
    

    That way your "cleanup" (i.e. not producing interim console output) canbe to quickly/easily remove the tidydim() lines or remove the print(…) from the function.

    0 讨论(0)
  • 2021-01-14 20:00

    We could use the print within {}

    mtcars %>%
       filter(cyl > 4) %>%
       {print(dim(.))
        filter(., am == 0) } %>%
       {print(dim(.))
        filter(., disp >= 200)} %>%
       {print(dim(.))
       .}
    #[1] 21 11
    #[1] 16 11
    #[1] 14 11
    #    mpg cyl  disp  hp drat    wt  qsec vs am gear carb
    #1  21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
    #2  18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
    #3  18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
    #4  14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
    #5  16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
    #6  17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
    #7  15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
    #8  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
    #9  10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
    #10 14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
    #11 15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
    #12 15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
    #13 13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
    #14 19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
    
    0 讨论(0)
提交回复
热议问题