问题
I have a weird problem when estimating a random effects with the plm
package in R.
Here is a link to a dput
of part of my data: https://pastebin.com/raw/mTdh26dg
My code is:
library(plm)
library(haven)
pmales <- pdata.frame(males_part, index = c("NR", "YEAR"))
random <- plm(WAGE ~ SCHOOL + EXPER + EXPER2 + BLACK + HISP + MAR + UNION + RUR + NE + NC + S + factor(YEAR),
data = pmales, model = "random")
The reason I included libary(haven)
is that my original data set is a .dta file.
When I run this code I get this error:
Error in is.pbalanced.default(x) :
argument "y" is missing, with no default
The weird thing is that if I start with a clean R session and don't load haven
(and the import the data from the dput
), I don't get this error. I do get the error if I import from the dput
but load haven
anyway. I also don't get the error when estimating within
or pooling
models (even with haven loaded
).
Here is my sessionInfo()
:
R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Linux Mint 19.3
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=nl_NL.UTF-8
[6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=nl_NL.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=nl_NL.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] haven_2.2.0 plm_2.2-3
loaded via a namespace (and not attached):
[1] Rcpp_1.0.4.6 rstudioapi_0.11 Formula_1.2-3 magrittr_1.5 hms_0.5.3 MASS_7.3-51.5 lattice_0.20-41 rlang_0.4.5
[9] bibtex_0.4.2.2 fansi_0.4.1 stringr_1.4.0 tools_3.6.3 grid_3.6.3 nlme_3.1-144 cli_2.0.2 ellipsis_0.3.0
[17] maxLik_1.3-8 miscTools_0.6-26 assertthat_0.2.1 lmtest_0.9-37 digest_0.6.25 lifecycle_0.2.0 tibble_3.0.0 crayon_1.3.4
[25] bdsmatrix_1.3-4 vctrs_0.2.4 Rdpack_0.11-1 gbRd_0.4-11 glue_1.4.0 sandwich_2.5-1 stringi_1.4.6 pillar_1.4.3
[33] compiler_3.6.3 forcats_0.5.0 pkgconfig_2.0.3 zoo_1.8-7
Is this a bug in plm
or haven
? Or some sort of incompatibility of the two (or their dependencies)?
回答1:
I think the issue is that your data males_part
is a tibble, but you don't have the tibble
package loaded until you attach haven
. If you don't have tibble
loaded, then you won't have any methods for the tibble classes "tbl_df"
and "tbl"
, and it will act exactly like a data frame. Once tibble
is loaded, it will start to act like a tibble.
This is an issue because tibbles and data frames aren't identical, but the class of a tibble includes "data.frame"
. I'd guess what's happening is that plm
assumes that extracting a single column from a data frame gives a vector, but with a tibble, it gives another tibble.
The workaround for you is pretty simple. Just use males_part <- as.data.frame(males_part)
to remove the tibble class, and then haven
won't matter.
Conceivably this is worth reporting to the maintainer of plm
. It's a design flaw in tibble
that is causing the problem (if tibbles inherit from data.frame
, they should act like data frames), but tibbles are pretty common nowadays, and that design is unlikely to change. The plm
function could protect itself against this by putting data <- as.data.frame(data)
early in the pdata.frame
function,
or protecting every column extraction with drop = TRUE
.
来源:https://stackoverflow.com/questions/61249692/error-when-estimating-random-effects-model-with-plm-package-when-haven-is-loaded