Methods in R for large complex survey data sets?

后端 未结 2 1150
面向向阳花
面向向阳花 2021-02-09 05:44

I am not a survey methodologist or demographer, but am an avid fan of Thomas Lumley\'s R survey package. I\'ve been working with a relatively large complex survey data set, the

2条回答
  •  时光说笑
    2021-02-09 06:12

    for huge data sets, linearized designs (svydesign) are much slower than replication designs (svrepdesign). review the weighting functions within survey::as.svrepdesign and use one of them to directly make a replication design. you cannot use linearization for this task. and you are likely better off not even using as.svrepdesign but instead using the functions within it.

    for one example using cluster=, strata=, and fpc= directly into a replicate-weighted design, see

    https://github.com/ajdamico/asdfree/blob/master/Censo%20Demografico/download%20and%20import.R#L405-L429

    note you can also view minute-by-minute speed tests (with timestamps for each event) here http://monetdb.cwi.nl/testweb/web/eanthony/

    also note that the replicates= argument is nearly 100% responsible for the speed that the design will run. so perhaps make two designs, one for coefficients (with just a couple of replicates) and another for SEs (with as many as you can tolerate). run your coefficients interactively and refine which numbers you need during the day, then leave the bigger processes that require SE calculations running overnight

提交回复
热议问题