How to find the statistical mode?

前端 未结 30 1692
时光取名叫无心
时光取名叫无心 2020-11-21 07:00

In R, mean() and median() are standard functions which do what you\'d expect. mode() tells you the internal storage mode of the objec

相关标签:
30条回答
  • 2020-11-21 07:39

    There are multiple solutions provided for this one. I checked the first one and after that wrote my own. Posting it here if it helps anyone:

    Mode <- function(x){
      y <- data.frame(table(x))
      y[y$Freq == max(y$Freq),1]
    }
    

    Lets test it with a few example. I am taking the iris data set. Lets test with numeric data

    > Mode(iris$Sepal.Length)
    [1] 5
    

    which you can verify is correct.

    Now the only non numeric field in the iris dataset(Species) does not have a mode. Let's test with our own example

    > test <- c("red","red","green","blue","red")
    > Mode(test)
    [1] red
    

    EDIT

    As mentioned in the comments, user might want to preserve the input type. In which case the mode function can be modified to:

    Mode <- function(x){
      y <- data.frame(table(x))
      z <- y[y$Freq == max(y$Freq),1]
      as(as.character(z),class(x))
    }
    

    The last line of the function simply coerces the final mode value to the type of the original input.

    0 讨论(0)
  • 2020-11-21 07:41

    I've written the following code in order to generate the mode.

    MODE <- function(dataframe){
        DF <- as.data.frame(dataframe)
    
        MODE2 <- function(x){      
            if (is.numeric(x) == FALSE){
                df <- as.data.frame(table(x))  
                df <- df[order(df$Freq), ]         
                m <- max(df$Freq)        
                MODE1 <- as.vector(as.character(subset(df, Freq == m)[, 1]))
    
                if (sum(df$Freq)/length(df$Freq)==1){
                    warning("No Mode: Frequency of all values is 1", call. = FALSE)
                }else{
                    return(MODE1)
                }
    
            }else{ 
                df <- as.data.frame(table(x))  
                df <- df[order(df$Freq), ]         
                m <- max(df$Freq)        
                MODE1 <- as.vector(as.numeric(as.character(subset(df, Freq == m)[, 1])))
    
                if (sum(df$Freq)/length(df$Freq)==1){
                    warning("No Mode: Frequency of all values is 1", call. = FALSE)
                }else{
                    return(MODE1)
                }
            }
        }
    
        return(as.vector(lapply(DF, MODE2)))
    }
    

    Let's try it:

    MODE(mtcars)
    MODE(CO2)
    MODE(ToothGrowth)
    MODE(InsectSprays)
    
    0 讨论(0)
  • 2020-11-21 07:41

    I case your observations are classes from Real numbers and you expect that the mode to be 2.5 when your observations are 2, 2, 3, and 3 then you could estimate the mode with mode = l1 + i * (f1-f0) / (2f1 - f0 - f2) where l1..lower limit of most frequent class, f1..frequency of most frequent class, f0..frequency of classes before most frequent class, f2..frequency of classes after most frequent class and i..Class interval as given e.g. in 1, 2, 3:

    #Small Example
    x <- c(2,2,3,3) #Observations
    i <- 1          #Class interval
    
    z <- hist(x, breaks = seq(min(x)-1.5*i, max(x)+1.5*i, i), plot=F) #Calculate frequency of classes
    mf <- which.max(z$counts)   #index of most frequent class
    zc <- z$counts
    z$breaks[mf] + i * (zc[mf] - zc[mf-1]) / (2*zc[mf] - zc[mf-1] - zc[mf+1])  #gives you the mode of 2.5
    
    
    #Larger Example
    set.seed(0)
    i <- 5          #Class interval
    x <- round(rnorm(100,mean=100,sd=10)/i)*i #Observations
    
    z <- hist(x, breaks = seq(min(x)-1.5*i, max(x)+1.5*i, i), plot=F)
    mf <- which.max(z$counts)
    zc <- z$counts
    z$breaks[mf] + i * (zc[mf] - zc[mf-1]) / (2*zc[mf] - zc[mf-1] - zc[mf+1])  #gives you the mode of 99.5
    

    In case you want the most frequent level and you have more than one most frequent level you can get all of them e.g. with:

    x <- c(2,2,3,5,5)
    names(which(max(table(x))==table(x)))
    #"2" "5"
    
    0 讨论(0)
  • 2020-11-21 07:41

    Calculating Mode is mostly in case of factor variable then we can use

    labels(table(HouseVotes84$V1)[as.numeric(labels(max(table(HouseVotes84$V1))))])
    

    HouseVotes84 is dataset available in 'mlbench' package.

    it will give max label value. it is easier to use by inbuilt functions itself without writing function.

    0 讨论(0)
  • 2020-11-21 07:43

    Mode can't be useful in every situations. So the function should address this situation. Try the following function.

    Mode <- function(v) {
      # checking unique numbers in the input
      uniqv <- unique(v)
      # frquency of most occured value in the input data
      m1 <- max(tabulate(match(v, uniqv)))
      n <- length(tabulate(match(v, uniqv)))
      # if all elements are same
      same_val_check <- all(diff(v) == 0)
      if(same_val_check == F){
        # frquency of second most occured value in the input data
        m2 <- sort(tabulate(match(v, uniqv)),partial=n-1)[n-1]
        if (m1 != m2) {
          # Returning the most repeated value
          mode <- uniqv[which.max(tabulate(match(v, uniqv)))]
        } else{
          mode <- "Two or more values have same frequency. So mode can't be calculated."
        }
      } else {
        # if all elements are same
        mode <- unique(v)
      }
      return(mode)
    }
    

    Output,

    x1 <- c(1,2,3,3,3,4,5)
    Mode(x1)
    # [1] 3
    
    x2 <- c(1,2,3,4,5)
    Mode(x2)
    # [1] "Two or more varibles have same frequency. So mode can't be calculated."
    
    x3 <- c(1,1,2,3,3,4,5)
    Mode(x3)
    # [1] "Two or more values have same frequency. So mode can't be calculated."
    
    0 讨论(0)
  • 2020-11-21 07:44

    There is package modeest which provide estimators of the mode of univariate unimodal (and sometimes multimodal) data and values of the modes of usual probability distributions.

    mySamples <- c(19, 4, 5, 7, 29, 19, 29, 13, 25, 19)
    
    library(modeest)
    mlv(mySamples, method = "mfv")
    
    Mode (most likely value): 19 
    Bickel's modal skewness: -0.1 
    Call: mlv.default(x = mySamples, method = "mfv")
    

    For more information see this page

    0 讨论(0)
提交回复
热议问题