R's survey package interpolation handling for median estimates

两盒软妹~` 提交于 2020-06-29 03:57:12

问题


I'm reposting the question asked here hoping maybe to get a little more visibility.

This is a question concerning Lumley's survey package for R. Specifically, its handling of interpolation for median estimation, after several hours of looking into the matter.

I'm using a svyrep design which has the following form:

design <- svydesign(id = ~id_directorio, strata = ~estrato, weights = ~f_pers, check.strata = TRUE, data = datos) 
options(survey.lonely.psu="remove")
set.seed(234262762)
SB2K_2 = as.svrepdesign(design, type = "subbootstrap", replicates=2000)

When trying to get the median through a svyquantile in a svyby function, I get wrong median estimates when the sample size is small for some group:

svyby(~ing_t_p, by = ~CL_REGION + ~CL_GRUPO_OCU_08, subset(SB2K_2, ocup_ref==1 & CL_REGION == "CHL02" & sexo == 2), 
                       svyquantile, quantiles=c(0.5), method = "constant")

                   CL_REGION CL_GRUPO_OCU_08         V1        se
    CHL02.ISCO08_1     CHL02        ISCO08_1 1005886.00 409590.92
    CHL02.ISCO08_2     CHL02        ISCO08_2  749355.06  44882.23
    CHL02.ISCO08_3     CHL02        ISCO08_3  490000.00  14406.91
    CHL02.ISCO08_4     CHL02        ISCO08_4  450000.00  92620.61
    CHL02.ISCO08_5     CHL02        ISCO08_5  289750.62  16685.00
    CHL02.ISCO08_6     CHL02        ISCO08_6  449613.04       NaN #This is the row with a "wrong" median (V1)
    CHL02.ISCO08_7     CHL02        ISCO08_7   95535.84 123539.27
    CHL02.ISCO08_8     CHL02        ISCO08_8  599484.05 356666.34
    CHL02.ISCO08_9     CHL02        ISCO08_9  299742.02  17933.51

The row where the median is 449613 has only two observations, but instead of showing the middle point between the two, it shows the smaller number (note that the two of them share the same weight, so the correct median value would be 500569):

datos %>% filter(CL_REGION == "CHL02" & sexo == 2 & CL_GRUPO_OCU_08 == "ISCO08_6") %>% select(ing_t_p, f_pers)
# A tibble: 2 x 2
  ing_t_p f_pers
    <dbl>  <dbl>
1 449613.   98.7
2 551525.   98.7

After asking professor Lumley himself, he kindly pointed me to use the f argument on svyquantile, which deals with interpolation between data points. In this case, an f = 0.5 would get me the point in the middle, but it is not working and gives me an error message:

svyby(~ing_t_p, by = ~CL_REGION + ~CL_GRUPO_OCU_08, subset(SB2K_2, ocup_ref==1 & CL_REGION == "CHL02" & sexo == 2), 
                     svyquantile, quantiles=c(0.5), method = "constant", f = 0.5)
Error in eval(predvars, data, env) : object 'ing_t_p' not found

Why do I get this error? How can I get the correct median estimates with the survey package when the groups are small?

EDIT:

Trying to boil down the problem, this arises with the svydesign too (not using the svyrep.design)

svyby(~ing_t_p, ~CL_REGION + ~CL_GRUPO_OCU_08, subset(design, ocup_ref==1 & CL_REGION == "CHL02" & sexo == 2), 
+                    svyquantile, quantiles=c(0.5), ci = TRUE)
               CL_REGION CL_GRUPO_OCU_08    ing_t_p        se
CHL02.ISCO08_1     CHL02        ISCO08_1 1005262.68 248216.08
CHL02.ISCO08_2     CHL02        ISCO08_2  749355.06  62219.18
CHL02.ISCO08_3     CHL02        ISCO08_3  489643.22  33507.74
CHL02.ISCO08_4     CHL02        ISCO08_4  449997.64  74549.55
CHL02.ISCO08_5     CHL02        ISCO08_5  284307.34  15408.06
CHL02.ISCO08_6     CHL02        ISCO08_6  449613.04       NaN
CHL02.ISCO08_7     CHL02        ISCO08_7   93033.74 109500.28
CHL02.ISCO08_8     CHL02        ISCO08_8  547251.67 429428.77
CHL02.ISCO08_9     CHL02        ISCO08_9  296445.55  18053.37 

来源:https://stackoverflow.com/questions/62306784/rs-survey-package-interpolation-handling-for-median-estimates

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!