I\'m inside a big function I have to write. In the last part I have to calculate the mean of a column in a data frame. The name of the column I am operating on is given as a
if your column contain any value that you want to neglect. it will help you
## da is data frame & Ozone is column name
##for single column
mean(da$Ozone, na.rm = TRUE)
##for all columns
colMeans(x=da, na.rm = TRUE)
Suppose you have a data frame(say df) with columns "x" and "y", you can find mean of column (x or y) using:
1.Using mean() function
z<-mean(df$x)
2.Using the column name(say x) as a variable using attach()
function
attach(df)
mean(x)
When done you can call detach()
to remove "x"
detach()
3.Using with()
function, it lets you use columns of data frame as distinct variables.
z<-with(df,mean(x))
I think what you are being asked to do (or perhaps asking yourself?) is take a character value which matches the name of a column in a particular dataframe (possibly also given as a character). There are two tricks here. Most people learn to extract columns with the "$" operator and that won't work inside a function if the function is passed a character vecor. If the function is also supposed to accept character argument then you will need to use the get
function as well:
df1 <- data.frame(a=1:10, b=11:20)
mean_col <- function( dfrm, col ) mean( get(dfrm)[[ col ]] )
mean_col("df1", "b")
# [1] 15.5
There is sort of a semantic boundary between ordinary objects like character vectors and language objects like the names of objects. The get
function is one of the functions that lets you "promote" character values to language level evaluation. And the "$" function will NOT evaluate its argument in a function, so you need to use"[[". "$" only is useful at the console level and needs to be completely avoided in functions.
Any of the following should work!!
df <- data.frame(x=1:3,y=4:6)
mean(df$x)
mean(df[,1])
mean(df[["x"]])
Use summarise
in the dplyr
package:
library(dplyr)
summarise(df, Average = mean(col_name, na.rm = T))
note: dplyr
supports both summarise
and summarize
.
I think you're asking how to compute the mean of a variable in a data frame, given the name of the column. There are two typical approaches to doing this, one indexing with [[
and the other indexing with [
:
data(iris)
mean(iris[["Petal.Length"]])
# [1] 3.758
mean(iris[,"Petal.Length"])
# [1] 3.758
mean(iris[["Sepal.Width"]])
# [1] 3.057333
mean(iris[,"Sepal.Width"])
# [1] 3.057333