get lhs object name when piping with dplyr

后端 未结 5 2115
遥遥无期
遥遥无期 2021-02-13 15:43

I\'d like to have a function that can use pipe operator as exported from dplyr. I am not using magrittr.

df %>% my_function

How can I get df

相关标签:
5条回答
  • 2021-02-13 16:06

    Here's a hacky way of doing it, which I'm sure breaks in a ton of edge cases:

    library(data.table) # for the address function
                        # or parse .Internal(inspect if you feel masochistic
    
    fn = function(tbl) {
      objs = ls(parent.env(environment()))
      objs[sapply(objs,
              function(x) address(get(x, env = parent.env(environment()))) == address(tbl))]
    }
    
    df = data.frame(a = 1:10)
    df %>% fn
    #[1] "df"
    
    0 讨论(0)
  • 2021-02-13 16:06

    Inspired by the link mentioned by gersht

    You can go back 5 generations to get the name

    df %>% {parent.frame(5)$lhs}
    

    example as below:

    library(dplyr)
    
    a <- 1
    
    df1 <- data.frame(a = 1:10)
    
    df2 <- data.frame(a = 1:10)
    
    a %>% {parent.frame(5)$lhs}
    
    df1 %>% {parent.frame(5)$lhs}
    
    df2 %>% {parent.frame(5)$lhs}
    
    0 讨论(0)
  • 2021-02-13 16:09

    I don't believe this is possible without adding an extra argument to your my_function. When chaining functions with dplyr it automatically converts the df to a tbl_df object, hence the new name "." within the dplyr scope to make the piping simpler.

    The following is a very hacky way with dplyr which just adds an addition argument to return the name of the original data.frame

    my_function <- function(tbl, orig.df){print(deparse(substitute(orig.df)))}
    df %>% my_function(df)
    [1] "df"
    

    Note you couldn't just pass the df with your original function because the tbl_df object is automatically passed to all subsequent functions.

    0 讨论(0)
  • 2021-02-13 16:12

    Although the question is an old one, and the bounty has already been awarded, I would like to extend on gersht's excellent answer which works perfectly fine for getting the most lefthand-side object name. However, integrating this functionality in a dplyr workflow is not yet solved, apart from using this approach in the very last step of a pipe.

    Since I'm using dplyr a lot, I have created a group of custom wrapper functions around the common dplyr verbs which I call metadplyr (I'm still playing around with the functionality, which is why I haven't uploaded it on github yet).

    In essence, those functions create a new class called meta_tbl on top of a tibble and write certain things in the attributes of that object. Applied to the problem of the OP I provide a simple example with filter, but the procedure works on any other dplyr verb as well.

    In my original function family I use slightly different names than dplyr, but the approach also works when 'overwriting' the original dplyr verbs.

    Below is a new filter function which turns a data frame or tibble into a meta_tbl and writes the original name of the lhs object into the attribute .name. Here I am using a short version of gersht's approach.

    library(dplyr)
    
     filter <- function(.data, ...) {
    
        if(!("meta_tbl" %in% class(.data))) {
    
          .data2 <- as_tibble(.data)
    
          # add new class 'meta_tbl' to data.frame  
          attr(.data2, "class") <- c(attr(.data2, "class"), "meta_tbl")
    
          # write lhs original name into attributes
          i <- 1
          while(!("chain_parts" %in% ls(envir=parent.frame(i)))) {
            i <- i+1
          }
          attr(.data2, ".name") <- deparse(parent.frame(i)$lhs)
    
        }
    
        dplyr::filter(.data2, ...)
    
    }
    

    For convenience it is good to have some helper function to extract the original name from the attributes easily.

    .name <- function(.data) {
      if("meta_tbl" %in% class(.data)) {
      attr(.data, ".name")
      } else stop("this function only work on objects of class 'meta_tbl'")
    
    }
    

    Both functions can be used in a workflow in the following way:

    mtcars %>% 
      filter(gear == 4) %>% 
      write.csv(paste0(.name(.), ".csv"))
    

    This might be a bad example, since the pipe doesn't continue, but in theory, we could use this pipe including the original name and pipe it in further function calls.

    0 讨论(0)
  • 2021-02-13 16:18

    The SO answer that JBGruber links to in the comments mostly solves the problem. It works by moving upwards through execution environments until a certain variable is found, then returns the lhs from that environment. The only thing missing is the requirement that the function outputs both the name of the original data frame and the manipulated data – I gleaned the latter requirement from one of the OP's comments. For that we just need to output a list containing these things, which we can do by modifying MrFlick's answer:

    get_orig_name <- function(df){
        i <- 1
        while(!("chain_parts" %in% ls(envir=parent.frame(i))) && i < sys.nframe()) {
            i <- i+1
        }
        list(name = deparse(parent.frame(i)$lhs), output = df)
    }
    

    Now we can run get_orig_name to the end of any pipeline to the get the manipulated data and the original data frame's name in a list. We access both using $:

    mtcars %>% summarize_all(mean) %>% get_orig_name
    
    #### OUTPUT ####
    
    $name
    [1] "mtcars"
    
    $output
           mpg    cyl     disp       hp     drat      wt     qsec     vs      am   gear   carb
    1 20.09062 6.1875 230.7219 146.6875 3.596563 3.21725 17.84875 0.4375 0.40625 3.6875 2.8125
    

    I should also mention that, although I think the details of this strategy are interesting, I also think it is needlessly complicated. It sounds like the OP's goal is to manipulate the data and then write it to a file with the same name as the original, unmanipulated, data frame, which can easily be done using more straightforward methods. For example, if we are dealing with multiple data frames we can just do something like the following:

    df_list <- list(mtcars = mtcars, iris = iris)
    
    for(name in names(df_list)){
        df_list[[name]] %>% 
            group_by_if(is.factor) %>%
            summarise_all(mean) %>% 
            write.csv(paste0(name, ".csv"))
    }
    
    0 讨论(0)
提交回复
热议问题