In R, mean()
and median()
are standard functions which do what you\'d expect. mode()
tells you the internal storage mode of the objec
I would use the density() function to identify a smoothed maximum of a (possibly continuous) distribution :
function(x) density(x, 2)$x[density(x, 2)$y == max(density(x, 2)$y)]
where x is the data collection. Pay attention to the adjust paremeter of the density function which regulate the smoothing.
Based on @Chris's function to calculate the mode or related metrics, however using Ken Williams's method to calculate frequencies. This one provides a fix for the case of no modes at all (all elements equally frequent), and some more readable method
names.
Mode <- function(x, method = "one", na.rm = FALSE) {
x <- unlist(x)
if (na.rm) {
x <- x[!is.na(x)]
}
# Get unique values
ux <- unique(x)
n <- length(ux)
# Get frequencies of all unique values
frequencies <- tabulate(match(x, ux))
modes <- frequencies == max(frequencies)
# Determine number of modes
nmodes <- sum(modes)
nmodes <- ifelse(nmodes==n, 0L, nmodes)
if (method %in% c("one", "mode", "") | is.na(method)) {
# Return NA if not exactly one mode, else return the mode
if (nmodes != 1) {
return(NA)
} else {
return(ux[which(modes)])
}
} else if (method %in% c("n", "nmodes")) {
# Return the number of modes
return(nmodes)
} else if (method %in% c("all", "modes")) {
# Return NA if no modes exist, else return all modes
if (nmodes > 0) {
return(ux[which(modes)])
} else {
return(NA)
}
}
warning("Warning: method not recognised. Valid methods are 'one'/'mode' [default], 'n'/'nmodes' and 'all'/'modes'")
}
Since it uses Ken's method to calculate frequencies the performance is also optimised, using AkselA's post I benchmarked some of the previous answers as to show how my function is close to Ken's in performance, with the conditionals for the various ouput options causing only minor overhead:
Another simple option that gives all values ordered by frequency is to use rle
:
df = as.data.frame(unclass(rle(sort(mySamples))))
df = df[order(-df$lengths),]
head(df)
Here, another solution:
freq <- tapply(mySamples,mySamples,length)
#or freq <- table(mySamples)
as.numeric(names(freq)[which.max(freq)])
R has so many add-on packages that some of them may well provide the [statistical] mode of a numeric list/series/vector.
However the standard library of R itself doesn't seem to have such a built-in method! One way to work around this is to use some construct like the following (and to turn this to a function if you use often...):
mySamples <- c(19, 4, 5, 7, 29, 19, 29, 13, 25, 19)
tabSmpl<-tabulate(mySamples)
SmplMode<-which(tabSmpl== max(tabSmpl))
if(sum(tabSmpl == max(tabSmpl))>1) SmplMode<-NA
> SmplMode
[1] 19
For bigger sample list, one should consider using a temporary variable for the max(tabSmpl) value (I don't know that R would automatically optimize this)
Reference: see "How about median and mode?" in this KickStarting R lesson
This seems to confirm that (at least as of the writing of this lesson) there isn't a mode function in R (well... mode() as you found out is used for asserting the type of variables).
A quick and dirty way of estimating the mode of a vector of numbers you believe come from a continous univariate distribution (e.g. a normal distribution) is defining and using the following function:
estimate_mode <- function(x) {
d <- density(x)
d$x[which.max(d$y)]
}
Then to get the mode estimate:
x <- c(5.8, 5.6, 6.2, 4.1, 4.9, 2.4, 3.9, 1.8, 5.7, 3.2)
estimate_mode(x)
## 5.439788