Name columns within aggregate in R

前端 未结 4 1777
逝去的感伤
逝去的感伤 2020-12-13 06:02

I know I can *re*name columns after I aggregate the data:

blubb <- aggregate(dat$two ~ dat$one, ...)
colnames(blubb) <- c(\"One\", \"Two\")


        
相关标签:
4条回答
  • 2020-12-13 06:09
    w <- data.frame(Funding<-"Fully Insured",Region="North East",claim_count=rnbinom(1000, 300.503572818, mu= 0.5739467))
    x <- data.frame(Funding<-"Fully Insured",Region="South East",claim_count=rnbinom(1000, 1000, mu= 0.70000000))
    y <- data.frame(Funding<-"Self Insured",Region="North East",claim_count=rnbinom(1000, 400, mu= 0.80000000))
    z <- data.frame(Funding<-"Self Insured",Region="South East",claim_count=rnbinom(1000, 700, mu= 1.70000000))
    names(w)<-c("Funding","Region","claim_count")
    names(x)<-c("Funding","Region","claim_count")
    names(y)<-c("Funding","Region","claim_count")
    names(z)<-c("Funding","Region","claim_count")
    my_df <- rbind(w,x,y,z)
    my_df2<-with(my_df, aggregate(x=claim_count, by=list(Funding,Region), FUN=sum))
    colnames(my_df2)<-colnames(my_df)
    
    0 讨论(0)
  • 2020-12-13 06:17

    In case you prefere writing aggreagtes as formula the documentation shows the usage of cbind. And cbind allows you to name its arguments, which are used by aggregate.

    blubb <- aggregate(cbind(Two = dat$two) ~ cbind(One = dat$one), ...)
    

    Aggregation of more than one column by more than one grouping factor could be done like:

    blubb <- aggregate(cbind(x = varX, y = varY, varZ) ~ cbind(a = facA) + cbind(b = facB) + facC, data=dat, FUN=sum)
    

    and if you want to use more than one function:

    aggregate(cbind(cases=ncases, ncontrols) ~ cbind(alc=alcgp) + tobgp, data = esoph, FUN = function(x) c("mean" = mean(x), "median" = median(x)))
    
    #   alc    tobgp cases.mean cases.median ncontrols.mean ncontrols.median
    #1    1 0-9g/day  1.5000000    1.0000000      43.500000        47.000000
    #2    2 0-9g/day  5.6666667    4.0000000      29.833333        34.500000
    #...
    

    which adds to the colname the used aggregate-function.

    But cbind replaces factors by their internal codes. To avoid this you can use:

    with(esoph, aggregate(data.frame(cases=ncases, ncontrols), data.frame(alc=alcgp, tobgp), FUN = function(x) c("mean" = mean(x), "median" = median(x))))
    
    #         alc    tobgp cases.mean cases.median ncontrols.mean ncontrols.median
    #1  0-39g/day 0-9g/day  1.5000000    1.0000000      43.500000        47.000000
    #2      40-79 0-9g/day  5.6666667    4.0000000      29.833333        34.500000
    #...
    
    0 讨论(0)
  • 2020-12-13 06:20

    You can use setNames as in:

    blubb <- setNames(aggregate(dat$two ~ dat$one, ...), c("One", "Two"))
    

    Alternatively, you can bypass the slick formula method, and use syntax like:

    blubb <- aggregate(list(One = dat$one), list(Two = dat$two), ...)
    

    Update

    This update is to just help get you started on deriving a solution on your own.

    If you inspect the code for stats:::aggregate.formula, you'll see the following lines towards the end:

    if (is.matrix(mf[[1L]])) {
        lhs <- as.data.frame(mf[[1L]])
        names(lhs) <- as.character(m[[2L]][[2L]])[-1L]
        aggregate.data.frame(lhs, mf[-1L], FUN = FUN, ...)
    }
    else aggregate.data.frame(mf[1L], mf[-1L], FUN = FUN, ...)
    

    If all that you want to do is append the function name to the variable that was aggregated, perhaps you can change that to something like:

    if (is.matrix(mf[[1L]])) {
      lhs <- as.data.frame(mf[[1L]])
      names(lhs) <- as.character(m[[2L]][[2L]])[-1L]
      myOut <- aggregate.data.frame(lhs, mf[-1L], FUN = FUN, ...)
      colnames(myOut) <- c(names(mf[-1L]), 
                           paste(names(lhs), deparse(substitute(FUN)), sep = "."))
    }
    else {
      myOut <- aggregate.data.frame(mf[1L], mf[-1L], FUN = FUN, ...)
      colnames(myOut) <- c(names(mf[-1L]), 
                           paste(strsplit(gsub("cbind\\(|\\)|\\s", "", 
                                               names(mf[1L])), ",")[[1]],
                                 deparse(substitute(FUN)), sep = "."))
    } 
    myOut
    

    This basically captures the value entered for FUN by using deparse(substitute(FUN)), so you can probably modify the function to accept a custom suffix, or perhaps even a vector of suffixes. This can probably be improved a bit with some work, but I'm not going to do it!

    Here is a Gist with this concept applied, creating a function named "myAgg".

    Here is some sample output of just the resulting column names:

    > names(myAgg(weight ~ feed, data = chickwts, mean))
    [1] "feed"        "weight.mean"
    > names(myAgg(breaks ~ wool + tension, data = warpbreaks, sum))
    [1] "wool"       "tension"    "breaks.sum"
    > names(myAgg(weight ~ feed, data = chickwts, FUN = function(x) mean(x^2)))
    [1] "feed"                         "weight.function(x) mean(x^2)"
    

    Notice that only the aggregated variable name changes. But notice also that if you use a custom function, you'll end up with a really strange column name!

    0 讨论(0)
  • 2020-12-13 06:24

    The answer to your first question is yes. You can certainly include the column names in the aggregate function. Using the names from your example above:

    blubb <- aggregate(dat,list(One=dat$One,Two=dat$Two),sum)

    I like the part about possibly pulling in the original column names automatically. If I figure it out I'll post it.

    0 讨论(0)
提交回复
热议问题