F argument of survey package does not give expected output

佐手、 提交于 2020-08-10 20:13:48

问题


Follow up on R's survey package interpolation handling for median estimates, which has not attracted many feedback. I have managed to boil down the issue to the following:

I'm using R's survey package to get the median estimation for a set of data. The data to replicate this issue is available as a dput text here.

The design I'm using is a class svyrep.design defined as the following:

design <- svydesign(id = ~id_directorio, strata = ~estrato, weights = ~f_pers, check.strata = TRUE, data = datos)
set.seed(234262762)
repdesign <- as.svrepdesign(design, type = "subbootstrap", replicates=20)
options(survey.lonely.psu="adjust")

A svyquantile inside a svyby does the job as expected:

svyby(formula = ~ing_t_p, by = ~CL_GRUPO_OCU_08, repdesign, svyquantile, quantiles=c(0.5),  method="constant", 
      f = 0.5, ties = "rounded", vartype=c("ci", "se"), ci=TRUE, na.rm=FALSE)

         CL_GRUPO_OCU_08         V1        se         cv        cv%
ISCO08_1        ISCO08_1 1002513.04 269630.31 0.26895442  26.895442
ISCO08_2        ISCO08_2  744505.53  68827.09 0.09244672   9.244672
ISCO08_3        ISCO08_3  489789.32  42839.16 0.08746447   8.746447
ISCO08_4        ISCO08_4  449806.52  69526.34 0.15456944  15.456944
ISCO08_5        ISCO08_5  286705.37  13392.01 0.04671002   4.671002
ISCO08_6        ISCO08_6  449613.04       NaN        NaN        NaN
ISCO08_7        ISCO08_7   93032.83 109534.62 1.17737600 117.737600
ISCO08_8        ISCO08_8  564514.15 437752.31 0.77544967  77.544967
ISCO08_9        ISCO08_9  293712.84  24497.97 0.08340790   8.340790

However, see the estimation for category ISCO08_6. Its not giving the expected median result. Instead, is showing the smallest number of the two:

datos %>% filter(CL_GRUPO_OCU_08 == "ISCO08_6")

# A tibble: 2 x 5
  id_directorio estrato f_pers ing_t_p CL_GRUPO_OCU_08
          <dbl>   <dbl>  <dbl>   <dbl> <chr>          
1         24568    2021   98.7 449613. ISCO08_6       
2         24568    2021   98.7 551525. ISCO08_6    

The f argument should deal with this (it manages data interpolation); and indeed it does for all the other cases, but it does not have an effect on the ISCO08_6 row. I have found that this issue affects estimations where there are only 2 or 4 data points.

So how do I get the median result using this method when the number of datapoints are small?


回答1:


Ok, it looks as though you need to ask for a quantile very slightly larger than 0.5 to get what you want. I will look into whether this is a bug or whether it was necessary to get agreement with some other system like SUDAAN. I will either fix or document this for the next version (or perhaps add yet another option). Quantiles are the worst.

Here are examples just using svyquantile()

> svyquantile(~ing_t_p, quantile=0.5000001, design=dd, f=0.5, ies="rounded", method="constant")
             0.5
ing_t_p 500569.2
> svyquantile(~ing_t_p, quantile=0.5000001, design=dd, f=0, ties="rounded", method="constant")
           0.5
ing_t_p 449613
> svyquantile(~ing_t_p, quantile=0.5000001, design=dd, f=1, ties="rounded", method="constant")
             0.5
ing_t_p 551525.3

And here using svyby(). Note that you have to use formula= in the first argument, otherwise the f=0.5 argument is interpreted by R as formula=0.5

> svyby(formula=~ing_t_p, by = ~CL_GRUPO_OCU_08, design, svyquantile, quantiles=c(0.5000001),f=0.5, method="constant", vartype=c("ci", "se"), ci=TRUE, na.rm.all=FALSE)
         CL_GRUPO_OCU_08    ing_t_p        se      ci_l      ci_u
ISCO08_1        ISCO08_1 1002513.04 254418.31 550769.11 1629454.6
ISCO08_2        ISCO08_2  749355.06  62294.16 649720.53  899613.0
ISCO08_3        ISCO08_3  489789.32  32140.54 409819.42  538808.8
ISCO08_4        ISCO08_4  449806.52  74549.55 349699.00  650000.0
ISCO08_5        ISCO08_5  286705.37  15349.64 240706.43  301766.1
ISCO08_6        ISCO08_6  500569.18       NaN       NaN       NaN
ISCO08_7        ISCO08_7   93032.83 108653.60  55000.00  503500.0
ISCO08_8        ISCO08_8  564514.15 429428.77  80470.95 2061000.0
ISCO08_9        ISCO08_9  293712.84  18830.76 245000.00  320539.5
There were 12 warnings (use warnings() to see them)


来源:https://stackoverflow.com/questions/62452042/f-argument-of-survey-package-does-not-give-expected-output

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!