问题
I have a big data frame df, with columns named as :
age, income, country
what I want to do is very simpe actually, do
fitFunc<-function(thisCountry){
subframe<-df[which(country==thisCountry)];
fit<-lm(income~0+age, data=subframe);
return(coef(fit));
}
for each individual country. Then aggregate the result into a new data frame looks like :
countryname, coeffname
1 USA 1.2
2 GB 1.0
3 France 1.1
I tried to do :
do.call("rbind", lapply(allRics[1:5], fitit))
but i don know what to do next.
Can anyone help?
thanks!
回答1:
Does this work for you?
set.seed(1)
df<-data.frame(income=rnorm(100,100,20),age=rnorm(100,40,10),country=factor(sample(1:3,100,replace=T),levels=1:3,labels=c("us","gb","france")))
out<-lapply(levels(df$country) , function(z) {
data.frame(country=z, age= coef(lm(income~0+age, data=df[df$country==z,])),row.names=NULL)
})
do.call(rbind ,out)
回答2:
Using @user20650's example data, this seems to produce the same result:
require(data.table)
dt <- data.table(df)
dt[,list(age=lm(income~0+age)$coef),by=country]
# country age
# 1: gb 2.428830
# 2: us 2.540879
# 3: france 2.369560
You'll need to install the data.table
package first.
回答3:
Note that the plyr
package is created for tasks like this. It performs a function on a subset of the data and returns the results in a prespicified form. Using ddply
we enter a data frame and get a data frame with the results back. See plyr
example sessions and help files to learn more about this. It is well worth the effort to get acquanted with this package!
See http://plyr.had.co.nz/ for a start.
library(plyr)
age <- runif(1000, 18, 80)
income <- 2000 + age*100 + rnorm(1000,0, 2000)
country <- factor(sample(LETTERS[1:10], 1000, replace = T))
dat <- data.frame(age, income, country)
get.coef <- function(dat) lm(income ~ 0 + age, dat)$coefficients
ddply(dat, .(country), get.coef)
来源:https://stackoverflow.com/questions/16633921/how-to-run-lm-for-each-subset-of-the-data-frame-and-then-aggreage-the-result