How to find the statistical mode?

前端 未结 30 1641
时光取名叫无心
时光取名叫无心 2020-11-21 07:00

In R, mean() and median() are standard functions which do what you\'d expect. mode() tells you the internal storage mode of the objec

相关标签:
30条回答
  • 2020-11-21 07:48

    A small modification to Ken Williams' answer, adding optional params na.rm and return_multiple.

    Unlike the answers relying on names(), this answer maintains the data type of x in the returned value(s).

    stat_mode <- function(x, return_multiple = TRUE, na.rm = FALSE) {
      if(na.rm){
        x <- na.omit(x)
      }
      ux <- unique(x)
      freq <- tabulate(match(x, ux))
      mode_loc <- if(return_multiple) which(freq==max(freq)) else which.max(freq)
      return(ux[mode_loc])
    }
    

    To show it works with the optional params and maintains data type:

    foo <- c(2L, 2L, 3L, 4L, 4L, 5L, NA, NA)
    bar <- c('mouse','mouse','dog','cat','cat','bird',NA,NA)
    
    str(stat_mode(foo)) # int [1:3] 2 4 NA
    str(stat_mode(bar)) # chr [1:3] "mouse" "cat" NA
    str(stat_mode(bar, na.rm=T)) # chr [1:2] "mouse" "cat"
    str(stat_mode(bar, return_mult=F, na.rm=T)) # chr "mouse"
    

    Thanks to @Frank for simplification.

    0 讨论(0)
  • 2020-11-21 07:49

    This builds on jprockbelly's answer, by adding a speed up for very short vectors. This is useful when applying mode to a data.frame or datatable with lots of small groups:

    Mode <- function(x) {
       if ( length(x) <= 2 ) return(x[1])
       if ( anyNA(x) ) x = x[!is.na(x)]
       ux <- unique(x)
       ux[which.max(tabulate(match(x, ux)))]
    }
    
    0 讨论(0)
  • 2020-11-21 07:52

    It seems to me that if a collection has a mode, then its elements can be mapped one-to-one with the natural numbers. So, the problem of finding the mode reduces to producing such a mapping, finding the mode of the mapped values, then mapping back to some of the items in the collection. (Dealing with NA occurs at the mapping phase).

    I have a histogram function that operates on a similar principal. (The special functions and operators used in the code presented herein should be defined in Shapiro and/or the neatOveRse. The portions of Shapiro and neatOveRse duplicated herein are so duplicated with permission; the duplicated snippets may be used under the terms of this site.) R pseudocode for histogram is

    .histogram <- function (i)
            if (i %|% is.empty) integer() else
            vapply2(i %|% max %|% seqN, `==` %<=% i %O% sum)
    
    histogram <- function(i) i %|% rmna %|% .histogram
    

    (The special binary operators accomplish piping, currying, and composition) I also have a maxloc function, which is similar to which.max, but returns all the absolute maxima of a vector. R pseudocode for maxloc is

    FUNloc <- function (FUN, x, na.rm=F)
            which(x == list(identity, rmna)[[na.rm %|% index.b]](x) %|% FUN)
    
    maxloc <- FUNloc %<=% max
    
    minloc <- FUNloc %<=% min # I'M THROWING IN minloc TO EXPLAIN WHY I MADE FUNloc
    

    Then

    imode <- histogram %O% maxloc
    

    and

    x %|% map %|% imode %|% unmap
    

    will compute the mode of any collection, provided appropriate map-ping and unmap-ping functions are defined.

    0 讨论(0)
  • 2020-11-21 07:53

    Here are several ways you can do it in Theta(N) running time

    from collections import defaultdict
    
    def mode1(L):
        counts = defaultdict(int)
        for v in L:
            counts[v] += 1
        return max(counts,key=lambda x:counts[x])
    
    
    def mode2(L):
        vals = set(L)
        return max(vals,key=lambda x: L.count(x))
    
    def mode3(L):
        return max(set(L), key=lambda x: L.count(x))
    
    0 讨论(0)
  • 2020-11-21 07:53

    Adding in raster::modal() as an option, although note that raster is a hefty package and may not be worth installing if you don't do geospatial work.

    The source code could be pulled out of https://github.com/rspatial/raster/blob/master/src/modal.cpp and https://github.com/rspatial/raster/blob/master/R/modal.R into a personal R package, for those who are particularly keen.

    0 讨论(0)
  • 2020-11-21 07:55

    I found Ken Williams post above to be great, I added a few lines to account for NA values and made it a function for ease.

    Mode <- function(x, na.rm = FALSE) {
      if(na.rm){
        x = x[!is.na(x)]
      }
    
      ux <- unique(x)
      return(ux[which.max(tabulate(match(x, ux)))])
    }
    
    0 讨论(0)
提交回复
热议问题