create aggregate column based on variables with R [duplicate]

佐手、 提交于 2019-12-25 03:29:18

问题


I apologize in advanced if this is somewhat of a noob question but I looked in the forum and couldn't find a way to search what I am trying to do. I have a training set and I am trying to find a way to reduce the number of levels I have for my categorical variables (In the example below the category is the state). I would like to map the state to the mean or rate of the level. My training set would look like the following once input into a data frame:

    state class mean
1      CA     1    0
2      AZ     1    0
3      NY     0    0
4      CA     0    0
5      NY     0    0
6      AZ     0    0
7      AZ     1    0
8      AZ     0    0
9      CA     0    0
10     VA     1    0

I would like the third column in my data frame to be the mean of the first column(state) based on the class variable. so the mean for CA rows will be 0.333 ... so that the mean column could be used as a replacement for the state column Is there some good way of doing this without writing an explicit loop in R?

How does one go about mapping new levels (example new states) if my training set didn't include them? Any link to approaches in R would be greatly appreciated.


回答1:


This is really what the ave function was designed for. It can really be used to construct any functional result by category, but its default funciton is mean hence the name, ie, ave-(rage):

dfrm$mean <- with( dfrm, ave( class, state ) ) #FUN=mean is the default "setting"



回答2:


    library(plyr)
    join(data,ddply(data,.(state),summarise,mean=mean(class)),by=("state"),type="left")


来源:https://stackoverflow.com/questions/8735283/create-aggregate-column-based-on-variables-with-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!