How to find the statistical mode?

前端未结

关注

 30  1770

In R, mean() and median() are standard functions which do what you\'d expect. mode() tells you the internal storage mode of the objec

相关标签:

30条回答

囚心锁ツ

2020-11-21 07:48

A small modification to Ken Williams' answer, adding optional params na.rm and return_multiple.

Unlike the answers relying on names(), this answer maintains the data type of x in the returned value(s).

stat_mode <- function(x, return_multiple = TRUE, na.rm = FALSE) {
  if(na.rm){
    x <- na.omit(x)
  }
  ux <- unique(x)
  freq <- tabulate(match(x, ux))
  mode_loc <- if(return_multiple) which(freq==max(freq)) else which.max(freq)
  return(ux[mode_loc])
}

To show it works with the optional params and maintains data type:

foo <- c(2L, 2L, 3L, 4L, 4L, 5L, NA, NA)
bar <- c('mouse','mouse','dog','cat','cat','bird',NA,NA)

str(stat_mode(foo)) # int [1:3] 2 4 NA
str(stat_mode(bar)) # chr [1:3] "mouse" "cat" NA
str(stat_mode(bar, na.rm=T)) # chr [1:2] "mouse" "cat"
str(stat_mode(bar, return_mult=F, na.rm=T)) # chr "mouse"

Thanks to @Frank for simplification.

0 讨论(0)

青春惊慌失措

2020-11-21 07:49
This builds on jprockbelly's answer, by adding a speed up for very short vectors. This is useful when applying mode to a data.frame or datatable with lots of small groups:
```
Mode <- function(x) {
   if ( length(x) <= 2 ) return(x[1])
   if ( anyNA(x) ) x = x[!is.na(x)]
   ux <- unique(x)
   ux[which.max(tabulate(match(x, ux)))]
}
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
一向

2020-11-21 07:52
It seems to me that if a collection has a mode, then its elements can be mapped one-to-one with the natural numbers. So, the problem of finding the mode reduces to producing such a mapping, finding the mode of the mapped values, then mapping back to some of the items in the collection. (Dealing with NA occurs at the mapping phase).

I have a histogram function that operates on a similar principal. (The special functions and operators used in the code presented herein should be defined in Shapiro and/or the neatOveRse. The portions of Shapiro and neatOveRse duplicated herein are so duplicated with permission; the duplicated snippets may be used under the terms of this site.) R pseudocode for histogram is
```
.histogram <- function (i)
        if (i %|% is.empty) integer() else
        vapply2(i %|% max %|% seqN, `==` %<=% i %O% sum)

histogram <- function(i) i %|% rmna %|% .histogram
```
(The special binary operators accomplish piping, currying, and composition) I also have a maxloc function, which is similar to which.max, but returns all the absolute maxima of a vector. R pseudocode for maxloc is
```
FUNloc <- function (FUN, x, na.rm=F)
        which(x == list(identity, rmna)[[na.rm %|% index.b]](x) %|% FUN)

maxloc <- FUNloc %<=% max

minloc <- FUNloc %<=% min # I'M THROWING IN minloc TO EXPLAIN WHY I MADE FUNloc
```
Then
```
imode <- histogram %O% maxloc
```
and
```
x %|% map %|% imode %|% unmap
```
will compute the mode of any collection, provided appropriate map-ping and unmap-ping functions are defined.
0 讨论(0)
发布评论:

提交评论
- 加载中...

耶瑟儿～

2020-11-21 07:53

Here are several ways you can do it in Theta(N) running time

from collections import defaultdict

def mode1(L):
    counts = defaultdict(int)
    for v in L:
        counts[v] += 1
    return max(counts,key=lambda x:counts[x])

def mode2(L):
    vals = set(L)
    return max(vals,key=lambda x: L.count(x))

def mode3(L):
    return max(set(L), key=lambda x: L.count(x))

0 讨论(0)

灰色年华

2020-11-21 07:53

Adding in raster::modal() as an option, although note that raster is a hefty package and may not be worth installing if you don't do geospatial work.

The source code could be pulled out of https://github.com/rspatial/raster/blob/master/src/modal.cpp and https://github.com/rspatial/raster/blob/master/R/modal.R into a personal R package, for those who are particularly keen.

0 讨论(0)
发布评论:

提交评论
- 加载中...
一个人的身影

2020-11-21 07:55
I found Ken Williams post above to be great, I added a few lines to account for NA values and made it a function for ease.
```
Mode <- function(x, na.rm = FALSE) {
  if(na.rm){
    x = x[!is.na(x)]
  }

  ux <- unique(x)
  return(ux[which.max(tabulate(match(x, ux)))])
}
```
0 讨论(0)
发布评论:

提交评论
- 加载中...