How to run lm for each subset of the data frame, and then aggreage the result? [duplicate]

心不动则不痛 提交于 2019-12-08 03:45:08

问题


I have a big data frame df, with columns named as :

age, income, country

what I want to do is very simpe actually, do

fitFunc<-function(thisCountry){
    subframe<-df[which(country==thisCountry)];
    fit<-lm(income~0+age, data=subframe);
    return(coef(fit));
}

for each individual country. Then aggregate the result into a new data frame looks like :

    countryname,  coeffname
1      USA         1.2
2      GB          1.0
3      France      1.1

I tried to do :

do.call("rbind", lapply(allRics[1:5], fitit))

but i don know what to do next.

Can anyone help?

thanks!


回答1:


Does this work for you?

    set.seed(1)
    df<-data.frame(income=rnorm(100,100,20),age=rnorm(100,40,10),country=factor(sample(1:3,100,replace=T),levels=1:3,labels=c("us","gb","france")))

    out<-lapply(levels(df$country) , function(z) {
        data.frame(country=z, age= coef(lm(income~0+age, data=df[df$country==z,])),row.names=NULL)
    })
do.call(rbind ,out)



回答2:


Using @user20650's example data, this seems to produce the same result:

require(data.table)
dt <- data.table(df)
dt[,list(age=lm(income~0+age)$coef),by=country]

#    country      age
# 1:      gb 2.428830
# 2:      us 2.540879
# 3:  france 2.369560

You'll need to install the data.table package first.




回答3:


Note that the plyr package is created for tasks like this. It performs a function on a subset of the data and returns the results in a prespicified form. Using ddply we enter a data frame and get a data frame with the results back. See plyr example sessions and help files to learn more about this. It is well worth the effort to get acquanted with this package! See http://plyr.had.co.nz/ for a start.

library(plyr)
age <- runif(1000, 18, 80)
income <- 2000 + age*100 + rnorm(1000,0, 2000)
country <- factor(sample(LETTERS[1:10], 1000, replace = T))
dat <- data.frame(age, income, country)

get.coef <- function(dat) lm(income ~ 0 + age, dat)$coefficients

ddply(dat, .(country), get.coef)


来源:https://stackoverflow.com/questions/16633921/how-to-run-lm-for-each-subset-of-the-data-frame-and-then-aggreage-the-result

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!