How to find the statistical mode?

前端未结

关注

 30  1778

In R, mean() and median() are standard functions which do what you\'d expect. mode() tells you the internal storage mode of the objec

相关标签:

30条回答

陌清茗

2020-11-21 08:01

Here is a function to find the mode:

mode <- function(x) {
  unique_val <- unique(x)
  counts <- vector()
  for (i in 1:length(unique_val)) {
    counts[i] <- length(which(x==unique_val[i]))
  }
  position <- c(which(counts==max(counts)))
  if (mean(counts)==max(counts)) 
    mode_x <- 'Mode does not exist'
  else 
    mode_x <- unique_val[position]
  return(mode_x)
}

0 讨论(0)

梦谈多话

2020-11-21 08:02
Below is the code which can be use to find the mode of a vector variable in R.
```
a <- table([vector])

names(a[a==max(a)])
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
耶瑟儿～

2020-11-21 08:02
I was looking through all these options and started to wonder about their relative features and performances, so I did some tests. In case anyone else are curious about the same, I'm sharing my results here.

Not wanting to bother about all the functions posted here, I chose to focus on a sample based on a few criteria: the function should work on both character, factor, logical and numeric vectors, it should deal with NAs and other problematic values appropriately, and output should be 'sensible', i.e. no numerics as character or other such silliness.

I also added a function of my own, which is based on the same rle idea as chrispy's, except adapted for more general use:
```
library(magrittr)

Aksel <- function(x, freq=FALSE) {
    z <- 2
    if (freq) z <- 1:2
    run <- x %>% as.vector %>% sort %>% rle %>% unclass %>% data.frame
    colnames(run) <- c("freq", "value")
    run[which(run$freq==max(run$freq)), z] %>% as.vector   
}

set.seed(2)

F <- sample(c("yes", "no", "maybe", NA), 10, replace=TRUE) %>% factor
Aksel(F)

# [1] maybe yes  

C <- sample(c("Steve", "Jane", "Jonas", "Petra"), 20, replace=TRUE)
Aksel(C, freq=TRUE)

# freq value
#    7 Steve
```
I ended up running five functions, on two sets of test data, through microbenchmark. The function names refer to their respective authors:

Chris' function was set to method="modes" and na.rm=TRUE by default to make it more comparable, but other than that the functions were used as presented here by their authors.

In matter of speed alone Kens version wins handily, but it is also the only one of these that will only report one mode, no matter how many there really are. As is often the case, there's a trade-off between speed and versatility. In method="mode", Chris' version will return a value iff there is one mode, else NA. I think that's a nice touch. I also think it's interesting how some of the functions are affected by an increased number of unique values, while others aren't nearly as much. I haven't studied the code in detail to figure out why that is, apart from eliminating logical/numeric as a the cause.
0 讨论(0)
发布评论:

提交评论
- 加载中...
时光取名叫无心

2020-11-21 08:03
One more solution, which works for both numeric & character/factor data:
```
Mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}
```
On my dinky little machine, that can generate & find the mode of a 10M-integer vector in about half a second.

If your data set might have multiple modes, the above solution takes the same approach as which.max, and returns the first-appearing value of the set of modes. To return all modes, use this variant (from @digEmAll in the comments):
```
Modes <- function(x) {
  ux <- unique(x)
  tab <- tabulate(match(x, ux))
  ux[tab == max(tab)]
}
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
终归单人心

2020-11-21 08:03
found this on the r mailing list, hope it's helpful. It is also what I was thinking anyways. You'll want to table() the data, sort and then pick the first name. It's hackish but should work.
```
names(sort(-table(x)))[1]
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
忘了有多久

2020-11-21 08:03
I can't vote yet but Rasmus Bååth's answer is what I was looking for. However, I would modify it a bit allowing to contrain the distribution for example fro values only between 0 and 1.
```
estimate_mode <- function(x,from=min(x), to=max(x)) {
  d <- density(x, from=from, to=to)
  d$x[which.max(d$y)]
}
```
We aware that you may not want to constrain at all your distribution, then set from=-"BIG NUMBER", to="BIG NUMBER"
0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2 3 4 5