问题
I have a panel data, i.e. t
rows for each of n
observations (nxt
), such as
data("Grunfeld", package="plm")
head(Grunfeld)
firm year inv value capital
1 1935 317.6 3078.5 2.8
1 1936 391.8 4661.7 52.6
1 1937 410.6 5387.1 156.9
2 1935 257.7 2792.2 209.2
2 1936 330.8 4313.2 203.4
2 1937 461.2 4643.9 207.2
I want to make block bootstrapping, i.e. I want resample with replacement, taking a firm [i] with all the years in which it is observed. For instance, if year=1935:1937
and firm 1 is randomly drawn, I want that firm [1] will be in the new sample 3 times, corresponding to year=1935:1937
. If it is re-drawn, then it must be again 3 for times. Furthermore, I need to apply my own function to the new bootstrapped sample and I need to do this 500 times.
My current code is something like this:
library(boot)
boot.fun <- function(data) {
est.boot = myfunction(y=Grunfeld$v1, x=Grunfeld$v2, other parameters)
return(est.boot)
}
boot.sim <- function(data, mle) {
data = sample(data, ?? ) #
return(data)
}
start.time = Sys.time()
result.boot <- boot(Grunfeld, myfunction( ... ), R=500, sim = "parametric",
ran.gen = boot.sim)
Sys.time() - start.time
I was thinking to resample by specifying in a correct way data = sample(data, ?? )
as it works smooth and clean, using as index the column firm
. How could I do that? Is there any other more efficient alternative?
EDIT.
I do not necessarily need a new boot.function. I just need a (possibly fast) code which allows to resample with replacement, then I ll put it inside the boot
argument as ran.gen=code.which.works
.
The output should be a sample of the same dimension of the original, even though firms can be randomly picked twice or more (or not be picked). For instance the result could be
head(GrunfeldResampled)
firm year inv value capital
2 1935 257.7 2792.2 209.2
2 1936 330.8 4313.2 203.4
2 1937 461.2 4643.9 207.2
1 1935 317.6 3078.5 2.8
1 1936 391.8 4661.7 52.6
1 1937 410.6 5387.1 156.9
2 1935 257.7 2792.2 209.2
2 1936 330.8 4313.2 203.4
2 1937 461.2 4643.9 207.2
9 1935 317.6 3078.5 122.8
9 1936 391.8 4661.7 342.6
9 1937 410.6 5387.1 156.9
Basically I need each firm treated as a block
, and therefore the resampling should apply to the whole block. Hope this clarifies
回答1:
Apparently in this answer every firm is viewed for exactly 20 years, so I won't have a problem demonstrating:
data("Grunfeld", package="plm") #load data
Solution
#n is the the firms column, df is the dataframe
myfunc <- function(n,df) { #define function
unique_firms <- unique(n) #unique firms
sample_firms <- sample(unique_firms, size=length(unique_firms), replace=T ) #choose from unique firms randomly with replacement
new_df <- do.call(rbind, lapply(sample_firms, function(x) df[df$firm==x,] )) #fetch all years for each randomly picked firm and rbind
}
a <- myfunc(Grunfeld$firm, Grunfeld) #run function
Output
> str(a)
'data.frame': 200 obs. of 5 variables:
$ firm : int 4 4 4 4 4 4 4 4 4 4 ...
$ year : int 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 ...
$ inv : num 40.3 72.8 66.3 51.6 52.4 ...
$ value : num 418 838 884 438 680 ...
$ capital: num 10.5 10.2 34.7 51.8 64.3 67.1 75.2 71.4 67.1 60.5 ...
As you can see dim
is exactly the same as the input data.frame
For your data the solution will be:
myfunc <- function(n,df) { #define function
unique_firms <- unique(n) #unique firms
print(unique_firms)
sample_firms <- sample(unique_firms, size=length(unique_firms), replace=T ) #choose from unique firms randomly with replacement
new_df <- do.call(rbind, lapply(sample_firms, function(x) df[df$country==x,] )) #fetch all years for each randomly picked firm and rbind
}
and Output:
> str(a)
'data.frame': 848 obs. of 18 variables:
$ isocode : Factor w/ 106 levels "AGO","ALB","ARG",..: 82 82 82 82 82 82 82 82 61 61 ...
$ time : int 2 3 4 5 6 7 8 9 2 3 ...
$ country : num 80 80 80 80 80 80 80 80 59 59 ...
$ year : int 1975 1980 1985 1990 1995 2000 2005 2010 1975 1980 ...
$ gdp : num 184619 210169 199343 268870 305255 ...
$ pop : num 33.4 34.9 36.6 37.8 38.3 ...
$ gdp_k : num 5526 6022 5443 7117 7969 ...
$ co2 : num 340353 431436 426881 431052 350874 ...
$ co2_k : num 10191 12333 11674 11407 9128 ...
$ oecd : int 1 1 1 1 1 1 1 1 1 1 ...
$ LI : int 0 0 0 0 0 0 0 0 0 0 ...
$ LMI : int 0 0 0 0 0 0 0 0 0 0 ...
$ UMI : int 0 0 0 0 0 0 0 0 0 0 ...
$ HI : int 1 1 1 1 1 1 1 1 1 1 ...
$ gdpk : num 5531 6018 5449 7118 7971 ...
$ co2k : num 10196 12355 11668 11412 9162 ...
$ co2_k.lag: num 8595 10191 12333 11674 11407 ...
$ gdp_k.lag: num 4730 5526 6022 5443 7117 ...
回答2:
You can do this with the "strata" parameter of the boot function. This is called stratified bootstrap. Changing the last line of your code:
result.boot <- boot(Grunfeld, boot.fun, R=500, sim = "ordinary",
strata = Grunfeld$firm)
i suppressed the parameter ran.gen & sim
I suggest theses changes to the boot function so it works properly:
boot.fun <- function(d, i) { # d being your data, i the set of indices)
est.boot = myfunction(y=d[i ,]$v1, x=d[i, ]$v2, other parameters)
return(est.boot)
}
来源:https://stackoverflow.com/questions/28150484/block-sampling-according-to-index-in-panel-data