I need to run a regression on a panel data . It has 3 dimensions (Year * Company * Country). For example:
============================================
year | comp | count | value.x | value.y
------+------+-------+----------+-----------
2000 | A | USA | 1029.0 | 239481
------+------+-------+----------+-----------
2000 | A | CAN | 2341.4 | 129333
------+------+-------+----------+-----------
2000 | B | USA | 2847.7 | 187319
------+------+-------+----------+-----------
2000 | B | CAN | 4820.5 | 392039
------+------+-------+----------+-----------
2001 | A | USA | 7289.9 | 429481
------+------+-------+----------+-----------
2001 | A | CAN | 5067.3 | 589143
------+------+-------+----------+-----------
2001 | B | USA | 7847.8 | 958234
------+------+-------+----------+-----------
2001 | B | CAN | 9820.0 | 1029385
============================================
However, the R package plm
seems not able to cope with more than 2 dimension.
I have tried
result <- plm(value.y ~ value.x, data = dataname, index = c("comp","count","year"))
and it returns error:
Error in pdata.frame(data, index) :
'index' can be of length 2 at the most (one individual and one time index)
How do you run regressions when the panel data (individual * time) has more than 1 dimension within "individual"?
In case anyone encounters the same situation, I'll put my solutions here:
R seems unable to cope with this situation. And the only thing you can do is to add dummies. If the categorical variables according to which you add dummies contains too much categories, you can try this:
makedummy <- function(colnum,data,interaction = FALSE,interation_varnum)
{
char0 = colnames(data)[colnum]
char1 = "dummy"
tmp = unique(data[,colnum])
valname = paste(char0,char1,tmp,sep = ".")
valname_int = paste(char0,char1,"int",tmp,sep = ".")
for(i in 1:(length(tmp)-1))
{
if(!interaction)
{
tmp_dummy <- ifelse(data[,colnum]==tmp[i],1,0)
}
if(interaction)
{
index = apply(as.matrix(data[,colnum]),1,identical,y = tmp[i])
tmp_dummy = c()
tmp_dummy[index] = data[index,interation_varnum]
tmp_dummy[!index] = 0
}
tmp_dummy <- data.frame(tmp_dummy)
if(!interaction)
{
colnames(tmp_dummy) <- valname[i]
}
if(interaction)
{
colnames(tmp_dummy) <- valname_int[i]
}
data<-cbind(data,tmp_dummy)
}
return(data)
}
for example:
## Create fake data
fakedata <- matrix(rnorm(300),nrow = 100)
cate <- LETTERS[sample(seq(1,10),100, replace = TRUE)]
fakedata <- cbind.data.frame(cate,fakedata)
## Try this
fakedata <- makedummy(1,fakedata)
## If you need to add dummy*x to see if there is any influences of different categories on the coefficients, try this
fakedata <- makedummy(1,fakedata,interaction = TRUE,interaction_varnum = 2)
Maybe a little bit verbose here, I didn't polish it. Any advice is welcome. Now you can perform OLS on your data.
This question is much like these:
- fixed effects in R: plm vs lm + factor()
- Fixed Effects plm package R - multiple observations per year/id
You may not want to create a new dummy, then with dplyr package you can use the group_indices
function. Although it do not support mutate
, the following approach is straightforward:
fakedata$id <- fakedata %>% group_indices(comp, count)
The id
variable will be your first panel dimension. So, you need to set the plm index argument to index = c("id", "year")
.
For alternatives you can take a look at this question: R create ID within a group.
If you want to control for another dimension in a within model, simply add a dummy for it:
plm(value.y ~ value.x + count, data = dataname, index = c("comp","year"))
Alternatively (especially for high-dimensional data), look at the lfe
package which can 'absorb' the additional dimension so the summary output is not polluted by the dummy variable.
I think you can also do:
df <-transform(df, ID = as.numeric(interaction(comp, count, drop=TRUE)))
And then estimate
result <- plm(value.y ~ value.x, data = df, index = ("ID","year"))
I think you want to use lm()
instead of plm(
). This blog post here discusses what you're after:
https://www.r-bloggers.com/r-tutorial-series-multiple-linear-regression/
for your example I'd imagine it would look something like the following:
lm(formula = comp ~ count + year, data = dataname)
来源:https://stackoverflow.com/questions/47446534/how-to-run-regressions-on-multidimensional-panel-data-in-r