问题
Im writing my own function to calculate the mean of a column in a data set and then applying it using apply() but it only returns the first columns mean. Below is my code
mymean <- function(cleaned_us){
column_total = sum(cleaned_us)
column_length = length(cleaned_us)
return (column_total/column_length)
}
Average_2 <- apply(numeric_clean_usnews,2,mymean,na.rm=T)
回答1:
We need to use the na.rm=TRUE
in the sum
and using it in apply
is not going to work as mymean
doesn't have that argument
mymean <- function(cleaned_us){
column_total = sum(cleaned_us, na.rm = TRUE) #change
column_length = sum(!is.na(cleaned_us)) #change
return(column_total/column_length)
}
Note that colMeans
can be used for getting the mean
for each column.
回答2:
In order to pass an na.rm
parameter to the function you defined, you need to make it a parameter of the function. The sum()
function has an na.rm
param, but length()
doesn't. So to write the function you are trying to write, you could say:
# include `na.rm` as a param of the argument
mymean <- function(cleaned_us, na.rm){
# pass it to `sum()`
column_total = sum(cleaned_us, na.rm=na.rm)
# if `na.rm` is set to `TRUE`, then don't count `NA`s
if (na.rm==TRUE){
column_length = length(cleaned_us[!is.na(cleaned_us)])
# but if it's `FALSE`, just use the full length
} else {
column_length = length(cleaned_us)
}
return (column_total/column_length)
}
Then your call should work:
Average_2 <- apply(numeric_clean_usnews, 2, mymean, na.rm=TRUE)
回答3:
Use na.omit()
set.seed(1)
m <- matrix(sample(c(1:9, NA), 100, replace=TRUE), 10)
mymean <- function(cleaned_us, na.rm){
if (na.rm) cleaned_us <- na.omit(cleaned_us)
column_total = sum(cleaned_us)
column_length = length(cleaned_us)
column_total/column_length
}
apply(m, 2, mymean, na.rm=TRUE)
# [1] 5.000 5.444 4.111 5.700 6.500 4.600 5.000 6.222 4.700 6.200
来源:https://stackoverflow.com/questions/47241761/ignoring-na-values-in-function