How to find the statistical mode?

前端 未结 30 1638
时光取名叫无心
时光取名叫无心 2020-11-21 07:00

In R, mean() and median() are standard functions which do what you\'d expect. mode() tells you the internal storage mode of the objec

相关标签:
30条回答
  • 2020-11-21 07:56

    I would use the density() function to identify a smoothed maximum of a (possibly continuous) distribution :

    function(x) density(x, 2)$x[density(x, 2)$y == max(density(x, 2)$y)]
    

    where x is the data collection. Pay attention to the adjust paremeter of the density function which regulate the smoothing.

    0 讨论(0)
  • 2020-11-21 07:58

    Based on @Chris's function to calculate the mode or related metrics, however using Ken Williams's method to calculate frequencies. This one provides a fix for the case of no modes at all (all elements equally frequent), and some more readable method names.

    Mode <- function(x, method = "one", na.rm = FALSE) {
      x <- unlist(x)
      if (na.rm) {
        x <- x[!is.na(x)]
      }
    
      # Get unique values
      ux <- unique(x)
      n <- length(ux)
    
      # Get frequencies of all unique values
      frequencies <- tabulate(match(x, ux))
      modes <- frequencies == max(frequencies)
    
      # Determine number of modes
      nmodes <- sum(modes)
      nmodes <- ifelse(nmodes==n, 0L, nmodes)
    
      if (method %in% c("one", "mode", "") | is.na(method)) {
        # Return NA if not exactly one mode, else return the mode
        if (nmodes != 1) {
          return(NA)
        } else {
          return(ux[which(modes)])
        }
      } else if (method %in% c("n", "nmodes")) {
        # Return the number of modes
        return(nmodes)
      } else if (method %in% c("all", "modes")) {
        # Return NA if no modes exist, else return all modes
        if (nmodes > 0) {
          return(ux[which(modes)])
        } else {
          return(NA)
        }
      }
      warning("Warning: method not recognised.  Valid methods are 'one'/'mode' [default], 'n'/'nmodes' and 'all'/'modes'")
    }
    

    Since it uses Ken's method to calculate frequencies the performance is also optimised, using AkselA's post I benchmarked some of the previous answers as to show how my function is close to Ken's in performance, with the conditionals for the various ouput options causing only minor overhead:

    0 讨论(0)
  • 2020-11-21 07:58

    Another simple option that gives all values ordered by frequency is to use rle:

    df = as.data.frame(unclass(rle(sort(mySamples))))
    df = df[order(-df$lengths),]
    head(df)
    
    0 讨论(0)
  • 2020-11-21 08:00

    Here, another solution:

    freq <- tapply(mySamples,mySamples,length)
    #or freq <- table(mySamples)
    as.numeric(names(freq)[which.max(freq)])
    
    0 讨论(0)
  • 2020-11-21 08:00

    R has so many add-on packages that some of them may well provide the [statistical] mode of a numeric list/series/vector.

    However the standard library of R itself doesn't seem to have such a built-in method! One way to work around this is to use some construct like the following (and to turn this to a function if you use often...):

    mySamples <- c(19, 4, 5, 7, 29, 19, 29, 13, 25, 19)
    tabSmpl<-tabulate(mySamples)
    SmplMode<-which(tabSmpl== max(tabSmpl))
    if(sum(tabSmpl == max(tabSmpl))>1) SmplMode<-NA
    > SmplMode
    [1] 19
    

    For bigger sample list, one should consider using a temporary variable for the max(tabSmpl) value (I don't know that R would automatically optimize this)

    Reference: see "How about median and mode?" in this KickStarting R lesson
    This seems to confirm that (at least as of the writing of this lesson) there isn't a mode function in R (well... mode() as you found out is used for asserting the type of variables).

    0 讨论(0)
  • 2020-11-21 08:01

    A quick and dirty way of estimating the mode of a vector of numbers you believe come from a continous univariate distribution (e.g. a normal distribution) is defining and using the following function:

    estimate_mode <- function(x) {
      d <- density(x)
      d$x[which.max(d$y)]
    }
    

    Then to get the mode estimate:

    x <- c(5.8, 5.6, 6.2, 4.1, 4.9, 2.4, 3.9, 1.8, 5.7, 3.2)
    estimate_mode(x)
    ## 5.439788
    
    0 讨论(0)
提交回复
热议问题