How to run lm for each subset of the data frame, and then aggreage the result? [duplicate]

问题

I have a big data frame df, with columns named as :

age, income, country

what I want to do is very simpe actually, do

fitFunc<-function(thisCountry){
    subframe<-df[which(country==thisCountry)];
    fit<-lm(income~0+age, data=subframe);
    return(coef(fit));
}

for each individual country. Then aggregate the result into a new data frame looks like :

    countryname,  coeffname
1      USA         1.2
2      GB          1.0
3      France      1.1

I tried to do :

do.call("rbind", lapply(allRics[1:5], fitit))

but i don know what to do next.

Can anyone help?

thanks!

回答1:

Does this work for you?

    set.seed(1)
    df<-data.frame(income=rnorm(100,100,20),age=rnorm(100,40,10),country=factor(sample(1:3,100,replace=T),levels=1:3,labels=c("us","gb","france")))

    out<-lapply(levels(df$country) , function(z) {
        data.frame(country=z, age= coef(lm(income~0+age, data=df[df$country==z,])),row.names=NULL)
    })
do.call(rbind ,out)

回答2:

Using @user20650's example data, this seems to produce the same result:

require(data.table)
dt <- data.table(df)
dt[,list(age=lm(income~0+age)$coef),by=country]

#    country      age
# 1:      gb 2.428830
# 2:      us 2.540879
# 3:  france 2.369560

You'll need to install the data.table package first.

回答3:

Note that the plyr package is created for tasks like this. It performs a function on a subset of the data and returns the results in a prespicified form. Using ddply we enter a data frame and get a data frame with the results back. See plyr example sessions and help files to learn more about this. It is well worth the effort to get acquanted with this package! See http://plyr.had.co.nz/ for a start.

library(plyr)
age <- runif(1000, 18, 80)
income <- 2000 + age*100 + rnorm(1000,0, 2000)
country <- factor(sample(LETTERS[1:10], 1000, replace = T))
dat <- data.frame(age, income, country)

get.coef <- function(dat) lm(income ~ 0 + age, dat)$coefficients

ddply(dat, .(country), get.coef)

来源：https://stackoverflow.com/questions/16633921/how-to-run-lm-for-each-subset-of-the-data-frame-and-then-aggreage-the-result

标签