Aggregating mixed data by factor column

前端 未结 1 1854
旧时难觅i
旧时难觅i 2021-01-26 07:38

For the past week I have been trying to aggregate my dataset that consists of different weight measurements in different months accompanied by a large volume of background varia

1条回答
  •  面向向阳花
    2021-01-26 07:59

    You could write your own functions and then use lapply. First, write a function to find the most frequent level in a factor variable

    getmode <- function(v) {
      levels(v)[which.max(table(v))]
    }
    

    Then write a function to return either the mean or mode depending on the type of variable passed to it

    my_summary <- function(x, id, ...){
      if (is.numeric(x)) {
        return(tapply(x, id, mean))
      }  
      if (is.factor(x)) {
        return(tapply(x, id, getmode))
      }  
    }
    

    Finally, use lapply to calculate the summaries

    data.frame(lapply(df, my_summary, id = df$IDnumber))
      IDnumber Gender   Weight LikesSoda
    1        1   Male 81.33333        No
    2        2 Female 68.00000       Yes
    3        3 Female 52.00000       Yes
    

    If there might be two or more levels in a factor with the same, maximum frequency then which.max will just return the first one. I understand from your comment that you just want to know how many of them there are, so one option might be to amend the getmode function slightly, so it adds an asterisk to the level when there is a tie:

    getmode <- function(v) {
      tab <- table(v)
      if (sum(tab %in% max(tab)) > 1)  return(paste(levels(v)[which.max(tab)], '*'))
      levels(v)[which.max(tab)]
    }
    

    (Changing your sample data so there is one Female and one Male with IDnumber == "2")

    data.frame(lapply(df, my_summary, id = df$IDnumber))
    
      IDnumber   Gender   Weight LikesSoda
    1        1     Male 81.33333        No
    2        2 Female * 68.00000       Yes
    3        3   Female 52.00000       Yes
    

    I'm afraid that's a bit of a messy 'solution', but if you just want to get an idea of how common that issue is, perhaps it will be sufficient for your needs.

    0 讨论(0)
提交回复
热议问题