How to find the statistical mode?

前端未结

关注

 30  1776

In R, mean() and median() are standard functions which do what you\'d expect. mode() tells you the internal storage mode of the objec

相关标签:

30条回答

小鲜肉

2020-11-21 07:56
I would use the density() function to identify a smoothed maximum of a (possibly continuous) distribution :
```
function(x) density(x, 2)$x[density(x, 2)$y == max(density(x, 2)$y)]
```
where x is the data collection. Pay attention to the adjust paremeter of the density function which regulate the smoothing.
0 讨论(0)
发布评论:

提交评论
- 加载中...

梦如初夏

2020-11-21 07:58

Based on @Chris's function to calculate the mode or related metrics, however using Ken Williams's method to calculate frequencies. This one provides a fix for the case of no modes at all (all elements equally frequent), and some more readable method names.

Mode <- function(x, method = "one", na.rm = FALSE) {
  x <- unlist(x)
  if (na.rm) {
    x <- x[!is.na(x)]
  }

  # Get unique values
  ux <- unique(x)
  n <- length(ux)

  # Get frequencies of all unique values
  frequencies <- tabulate(match(x, ux))
  modes <- frequencies == max(frequencies)

  # Determine number of modes
  nmodes <- sum(modes)
  nmodes <- ifelse(nmodes==n, 0L, nmodes)

  if (method %in% c("one", "mode", "") | is.na(method)) {
    # Return NA if not exactly one mode, else return the mode
    if (nmodes != 1) {
      return(NA)
    } else {
      return(ux[which(modes)])
    }
  } else if (method %in% c("n", "nmodes")) {
    # Return the number of modes
    return(nmodes)
  } else if (method %in% c("all", "modes")) {
    # Return NA if no modes exist, else return all modes
    if (nmodes > 0) {
      return(ux[which(modes)])
    } else {
      return(NA)
    }
  }
  warning("Warning: method not recognised.  Valid methods are 'one'/'mode' [default], 'n'/'nmodes' and 'all'/'modes'")
}

Since it uses Ken's method to calculate frequencies the performance is also optimised, using AkselA's post I benchmarked some of the previous answers as to show how my function is close to Ken's in performance, with the conditionals for the various ouput options causing only minor overhead:

0 讨论(0)

伪装坚强ぢ

2020-11-21 07:58
Another simple option that gives all values ordered by frequency is to use rle:
```
df = as.data.frame(unclass(rle(sort(mySamples))))
df = df[order(-df$lengths),]
head(df)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

余生分开走

2020-11-21 08:00

Here, another solution:

freq <- tapply(mySamples,mySamples,length)
#or freq <- table(mySamples)
as.numeric(names(freq)[which.max(freq)])

0 讨论(0)

花落未央

2020-11-21 08:00
R has so many add-on packages that some of them may well provide the [statistical] mode of a numeric list/series/vector.

However the standard library of R itself doesn't seem to have such a built-in method! One way to work around this is to use some construct like the following (and to turn this to a function if you use often...):
```
mySamples <- c(19, 4, 5, 7, 29, 19, 29, 13, 25, 19)
tabSmpl<-tabulate(mySamples)
SmplMode<-which(tabSmpl== max(tabSmpl))
if(sum(tabSmpl == max(tabSmpl))>1) SmplMode<-NA
> SmplMode
[1] 19
```
For bigger sample list, one should consider using a temporary variable for the max(tabSmpl) value (I don't know that R would automatically optimize this)

Reference: see "How about median and mode?" in this KickStarting R lesson
This seems to confirm that (at least as of the writing of this lesson) there isn't a mode function in R (well... mode() as you found out is used for asserting the type of variables).
0 讨论(0)
发布评论:

提交评论
- 加载中...
感动是毒

2020-11-21 08:01
A quick and dirty way of estimating the mode of a vector of numbers you believe come from a continous univariate distribution (e.g. a normal distribution) is defining and using the following function:
```
estimate_mode <- function(x) {
  d <- density(x)
  d$x[which.max(d$y)]
}
```
Then to get the mode estimate:
```
x <- c(5.8, 5.6, 6.2, 4.1, 4.9, 2.4, 3.9, 1.8, 5.7, 3.2)
estimate_mode(x)
## 5.439788
```
0 讨论(0)
发布评论:

提交评论
- 加载中...