问题
For commands like max
the option na.rm
is set by default to FALSE
. I understand why this is a good idea in general, but I'd like to turn it off reversibly for a while -- i.e. during a session.
How can I require R to set na.rm = TRUE
whenever it is an option? I found
options(na.action = na.omit)
but this doesn't work. I know that I can set a na.rm=TRUE
option for each and every function I write.
my.max <- function(x) {max(x, na.rm=TRUE)}
But that's not what I am looking for. I'm wondering if there's something I could do more globally/universally instead of doing it for each function.
回答1:
One workaround (dangerous), is to do the following :
- List all functions that have
na.rm
as argument. Here I limited my search to the base package. - Fetch each function and add this line at the beginning of its body:
na.rm = TRUE
- Assign the function back to the base package.
So first I store in a list (ll) all functions having na.rm
as argument:
uses_arg <- function(x,arg)
is.function(fx <- get(x)) &&
arg %in% names(formals(fx))
basevals <- ls(pos="package:base")
na.rm.f <- basevals[sapply(basevals,uses_arg,'na.rm')]
EDIT better method to get all na.rm's argument functions (thanks to mnel comment)
Funs <- Filter(is.function,sapply(ls(baseenv()),get,baseenv()))
na.rm.f <- names(Filter(function(x) any(names(formals(args(x)))%in% 'na.rm'),Funs))
So na.rm.f
list looks like:
[1] "all" "any" "colMeans" "colSums"
[5] "is.unsorted" "max" "mean.default" "min"
[9] "pmax" "pmax.int" "pmin" "pmin.int"
[13] "prod" "range" "range.default" "rowMeans"
[17] "rowsum.data.frame" "rowsum.default" "rowSums" "sum"
[21] "Summary.data.frame" "Summary.Date" "Summary.difftime" "Summary.factor"
[25] "Summary.numeric_version" "Summary.ordered" "Summary.POSIXct" "Summary.POSIXlt"
Then for each function I change the body, the code is inspired from data.table
package (FAQ 2.23) that add one line to the start of rbind.data.frame
and cbind.data.frame
.
ll <- lapply(na.rm.f,function(x)
{
tt <- get(x)
ss = body(tt)
if (class(ss)!="{") ss = as.call(c(as.name("{"), ss))
if(length(ss) < 2) print(x)
else{
if (!length(grep("na.rm = TRUE",ss[[2]],fixed=TRUE))) {
ss = ss[c(1,NA,2:length(ss))]
ss[[2]] = parse(text="na.rm = TRUE")[[1]]
body(tt)=ss
(unlockBinding)(x,baseenv())
assign(x,tt,envir=asNamespace("base"),inherits=FALSE)
lockBinding(x,baseenv())
}
}
})
No if you check , the first line of each function of our list :
unique(lapply(na.rm.f,function(x) body(get(x))[[2]]))
[[1]]
na.rm = TRUE
回答2:
It is not possible to change na.rm
to TRUE
globally. (See Hong Ooi's comment under the question.)
EDIT:
Unfortunately, the answer you don't want is the only one that works generally. There's no global option for this like there is for na.action, which only affects modeling functions like lm, glm, etc (and even there, it isn't guaranteed to work in all cases). – Hong Ooi Jul 2 '13 at 6:23
回答3:
For my R package, I overwrote the existing functions mean
and sum
. Thanks to the great Ben (comments below), I altered my functions to this:
mean <- function(x, ..., na.rm = TRUE) {
base::mean(x, ..., na.rm = na.rm)
}
After this, mean(c(2, NA, 3)) = 2.5
instead of NA
.
And for sum
:
sum <- function(x, ..., na.rm = TRUE) {
base::sum(x, ..., na.rm = na.rm)
}
This will yield sum(c(2, NA, 3)) = 5
instead of NA
.
sum(c(2, NA, 3, NaN))
also works.
回答4:
There were several answers about changing na.rm
argument globally already. I just want to notice about partial()
function from purrr
or pryr
packages. Using this function you can create a copy of existing function with predefined arguments:
library(purrr)
.mean <- partial(mean, na.rm = TRUE)
# Create sample vector
df <- c(1, 2, 3, 4, NA, 6, 7)
mean(df)
>[1] NA
.mean(df)
>[1] 3.833333
We can combine this tip with @agstudy answer and create copies of all functions with na.rm = TRUE
argument:
library(purrr)
# Create a vector of function names https://stackoverflow.com/a/17423072/9300556
Funs <- Filter(is.function,sapply(ls(baseenv()),get,baseenv()))
na.rm.f <- names(Filter(function(x) any(names(formals(args(x)))%in% 'na.rm'),Funs))
# Create strings. Dot "." is optional
fs <- lapply(na.rm.f,
function(x) paste0(".", x, "=partial(", x ,", na.rm = T)"))
eval(parse(text = fs))
So now, there are .all
, .min
, .max
, etc. in our .GlobalEnv
. You can run them:
.min(df)
> [1] 1
.max(df)
> [1] 7
.all(df)
> [1] TRUE
To overwrite functions, just remove dot "." from lapply call. Inspired by this blogpost
来源:https://stackoverflow.com/questions/17418640/is-it-possible-to-set-na-rm-to-true-globally