I\'d like to have a function that can use pipe operator as exported from dplyr. I am not using magrittr.
df %>% my_function
How can I get df
Here's a hacky way of doing it, which I'm sure breaks in a ton of edge cases:
library(data.table) # for the address function
# or parse .Internal(inspect if you feel masochistic
fn = function(tbl) {
objs = ls(parent.env(environment()))
objs[sapply(objs,
function(x) address(get(x, env = parent.env(environment()))) == address(tbl))]
}
df = data.frame(a = 1:10)
df %>% fn
#[1] "df"
Inspired by the link mentioned by gersht
You can go back 5 generations to get the name
df %>% {parent.frame(5)$lhs}
example as below:
library(dplyr)
a <- 1
df1 <- data.frame(a = 1:10)
df2 <- data.frame(a = 1:10)
a %>% {parent.frame(5)$lhs}
df1 %>% {parent.frame(5)$lhs}
df2 %>% {parent.frame(5)$lhs}
I don't believe this is possible without adding an extra argument to your my_function
. When chaining functions with dplyr
it automatically converts the df
to a tbl_df
object, hence the new name "."
within the dplyr
scope to make the piping simpler.
The following is a very hacky way with dplyr which just adds an addition argument to return the name of the original data.frame
my_function <- function(tbl, orig.df){print(deparse(substitute(orig.df)))}
df %>% my_function(df)
[1] "df"
Note you couldn't just pass the df
with your original function because the tbl_df
object is automatically passed to all subsequent functions.
Although the question is an old one, and the bounty has already been awarded, I would like to extend on gersht's excellent answer which works perfectly fine for getting the most lefthand-side object name. However, integrating this functionality in a dplyr workflow is not yet solved, apart from using this approach in the very last step of a pipe.
Since I'm using dplyr a lot, I have created a group of custom wrapper functions around the common dplyr verbs which I call metadplyr
(I'm still playing around with the functionality, which is why I haven't uploaded it on github yet).
In essence, those functions create a new class called meta_tbl
on top of a tibble and write certain things in the attributes of that object. Applied to the problem of the OP I provide a simple example with filter
, but the procedure works on any other dplyr verb as well.
In my original function family I use slightly different names than dplyr, but the approach also works when 'overwriting' the original dplyr verbs.
Below is a new filter function which turns a data frame or tibble into a meta_tbl and writes the original name of the lhs object into the attribute .name
. Here I am using a short version of gersht's approach.
library(dplyr)
filter <- function(.data, ...) {
if(!("meta_tbl" %in% class(.data))) {
.data2 <- as_tibble(.data)
# add new class 'meta_tbl' to data.frame
attr(.data2, "class") <- c(attr(.data2, "class"), "meta_tbl")
# write lhs original name into attributes
i <- 1
while(!("chain_parts" %in% ls(envir=parent.frame(i)))) {
i <- i+1
}
attr(.data2, ".name") <- deparse(parent.frame(i)$lhs)
}
dplyr::filter(.data2, ...)
}
For convenience it is good to have some helper function to extract the original name from the attributes easily.
.name <- function(.data) {
if("meta_tbl" %in% class(.data)) {
attr(.data, ".name")
} else stop("this function only work on objects of class 'meta_tbl'")
}
Both functions can be used in a workflow in the following way:
mtcars %>%
filter(gear == 4) %>%
write.csv(paste0(.name(.), ".csv"))
This might be a bad example, since the pipe doesn't continue, but in theory, we could use this pipe including the original name and pipe it in further function calls.
The SO answer that JBGruber links to in the comments mostly solves the problem. It works by moving upwards through execution environments until a certain variable is found, then returns the lhs
from that environment. The only thing missing is the requirement that the function outputs both the name of the original data frame and the manipulated data – I gleaned the latter requirement from one of the OP's comments. For that we just need to output a list containing these things, which we can do by modifying MrFlick's answer:
get_orig_name <- function(df){
i <- 1
while(!("chain_parts" %in% ls(envir=parent.frame(i))) && i < sys.nframe()) {
i <- i+1
}
list(name = deparse(parent.frame(i)$lhs), output = df)
}
Now we can run get_orig_name
to the end of any pipeline to the get the manipulated data and the original data frame's name in a list. We access both using $
:
mtcars %>% summarize_all(mean) %>% get_orig_name
#### OUTPUT ####
$name
[1] "mtcars"
$output
mpg cyl disp hp drat wt qsec vs am gear carb
1 20.09062 6.1875 230.7219 146.6875 3.596563 3.21725 17.84875 0.4375 0.40625 3.6875 2.8125
I should also mention that, although I think the details of this strategy are interesting, I also think it is needlessly complicated. It sounds like the OP's goal is to manipulate the data and then write it to a file with the same name as the original, unmanipulated, data frame, which can easily be done using more straightforward methods. For example, if we are dealing with multiple data frames we can just do something like the following:
df_list <- list(mtcars = mtcars, iris = iris)
for(name in names(df_list)){
df_list[[name]] %>%
group_by_if(is.factor) %>%
summarise_all(mean) %>%
write.csv(paste0(name, ".csv"))
}