Ignoring NA values in function

拟墨画扇 提交于 2021-02-10 14:23:07


Im writing my own function to calculate the mean of a column in a data set and then applying it using apply() but it only returns the first columns mean. Below is my code

mymean <- function(cleaned_us){
  column_total = sum(cleaned_us)
  column_length = length(cleaned_us)
  return (column_total/column_length)

Average_2 <- apply(numeric_clean_usnews,2,mymean,na.rm=T)


We need to use the na.rm=TRUE in the sum and using it in apply is not going to work as mymean doesn't have that argument

mymean <- function(cleaned_us){
   column_total = sum(cleaned_us, na.rm = TRUE) #change
   column_length = sum(!is.na(cleaned_us)) #change

Note that colMeans can be used for getting the mean for each column.


In order to pass an na.rm parameter to the function you defined, you need to make it a parameter of the function. The sum() function has an na.rm param, but length() doesn't. So to write the function you are trying to write, you could say:

# include `na.rm` as a param of the argument 
mymean <- function(cleaned_us, na.rm){

  # pass it to `sum()` 
  column_total = sum(cleaned_us, na.rm=na.rm)

  # if `na.rm` is set to `TRUE`, then don't count `NA`s 
  if (na.rm==TRUE){
    column_length = length(cleaned_us[!is.na(cleaned_us)])

  # but if it's `FALSE`, just use the full length
  } else {
    column_length = length(cleaned_us)

  return (column_total/column_length)

Then your call should work:

Average_2 <- apply(numeric_clean_usnews, 2, mymean, na.rm=TRUE)


Use na.omit()

m <- matrix(sample(c(1:9, NA), 100, replace=TRUE), 10)

mymean <- function(cleaned_us, na.rm){
    if (na.rm) cleaned_us <- na.omit(cleaned_us)
    column_total = sum(cleaned_us)
    column_length = length(cleaned_us)

apply(m, 2, mymean, na.rm=TRUE)

# [1] 5.000 5.444 4.111 5.700 6.500 4.600 5.000 6.222 4.700 6.200

