What is the difference between . and .data?

前端 未结 4 2050
挽巷
挽巷 2021-02-09 03:46

I\'m trying to develop a deeper understanding of using the dot (".") with dplyr and using the .data pronoun with dplyr. The code

4条回答
  •  情深已故
    2021-02-09 03:59

    Up front, I think .data's intent is a little confusing until one also considers its sibling pronoun, .env.

    The dot . is something that magrittr::%>% sets up and uses; since dplyr re-exports it, it's there. And whenever you reference it, it is a real object, so names(.), nrow(.), etc all work as expected. It does reflect data up to this point in the pipeline.

    .data, on the other hand, is defined within rlang for the purpose of disambiguating symbol resolution. Along with .env, it allows you to be perfectly clear on where you want a particular symbol resolved (when ambiguity is expected). From ?.data, I think this is a clarifying contrast:

    disp <- 10
    mtcars %>% mutate(disp = .data$disp * .env$disp)
    mtcars %>% mutate(disp = disp * disp)
    

    However, as stated in the help pages, .data (and .env) is just a "pronoun" (we have verbs, so now we have pronouns too), so it is just a pointer to explain to the tidy internals where the symbol should be resolved. It's just a hint of sorts.

    So your statement

    both . and .data just mean "our result up to this point in the pipeline."

    is not correct: . represents the data up to this point, .data is just a declarative hint to the internals.


    Consider another way of thinking about .data: let's say we have two functions that completely disambiguate the environment a symbol is referenced against:

    • get_internally, this symbol must always reference a column name, it will not reach out to the enclosing environment if the column does not exist; and
    • get_externally, this symbol must always reference a variable/object in the enclosing environment, it will never match a column.

    In that case, translating the above examples, one might use

    disp <- 10
    mtcars %>%
      mutate(disp = get_internally(disp) * get_externally(disp))
    

    In that case, it seems more obvious that get_internally is not a frame, so you can't call names(get_internally) and expect it to do something meaningful (other than NULL). It'd be like names(mutate).

    So don't think of .data as an object, think of it as a mechanism to disambiguate the environment of the symbol. I think the $ it uses is both terse/easy-to-use and absolutely-misleading: it is not a list-like or environment-like object, even if it is being treated as such.

    BTW: one can write any S3 method for $ that makes any classed-object look like a frame/environment:

    `$.quux` <- function(x, nm) paste0("hello, ", nm, "!")
    obj <- structure(0, class = "quux")
    obj$r2evans
    # [1] "hello, r2evans!"
    names(obj)
    # NULL
    

    (The presence of a $ accessor does not always mean the object is a frame/env.)

提交回复
热议问题