coxph() X matrix deemed to be singular;

南楼画角 提交于 2019-11-30 11:36:16

Here's a simple example which seems to reproduce your problem:

> library(survival)
> (df1 <- data.frame(t1=seq(1:6),
                    s1=rep(c(0, 1), 3),
                    te1=c(rep(0, 3), rep(1, 3)),
                    pa1=c(0,0,1,0,0,0)
                    ))
   t1 s1 te1 pa1
 1  1  0   0   0
 2  2  1   0   0
 3  3  0   0   1
 4  4  1   1   0
 5  5  0   1   0
 6  6  1   1   0

> (coxph(Surv(t1, s1) ~ te1*pa1, data=df1))
Call:
coxph(formula = Surv(t1, s1) ~ te1 * pa1, data = df1)


        coef exp(coef) se(coef)         z  p
te1      -23  9.84e-11    58208 -0.000396  1
pa1      -23  9.84e-11   100819 -0.000229  1
te1:pa1   NA        NA        0        NA NA

Now lets look for 'perfect classification' like so:

> (xtabs( ~ s1+te1, data=df1))
   te1
s1  0 1
  0 2 1
  1 1 2
> (xtabs( ~ s1+pa1, data=df1))
   pa1
s1  0 1
  0 2 1
  1 3 0

Note that a value of 1 for pa1 exactly predicts having a status s1 equal to 0. That is to say, based on your data, if you know that pa1==1 then you can be sure than s1==0. Thus fitting Cox's model is not appropriate in this setting and will result in numerical errors. This can be seen with

> coxph(Surv(t1, s1) ~ pa1, data=df1)

giving

Warning message:
In fitter(X, Y, strats, offset, init, control, weights = weights,  :
  Loglik converged before variable  1 ; beta may be infinite. 

It's important to look at these cross tables before fitting models. Also it's worth starting with simpler models before considering those involving interactions.

If we add the interaction term to df1 manually like this:

> (df1 <- within(df1,
+               te1pa1 <- te1*pa1))
  t1 s1 te1 pa1 te1pa1
1  1  0   0   0      0
2  2  1   0   0      0
3  3  0   0   1      0
4  4  1   1   0      0
5  5  0   1   0      0
6  6  1   1   0      0

Then check it with

> (xtabs( ~ s1+te1pa1, data=df1))
   te1pa1
s1  0
  0 3
  1 3

We can see that it's a useless classifier, i.e. it does not help predict status s1.

When combining all 3 terms, the fitter does manage to produce a numerical value for te1 and pe1 even though pe1 is a perfect predictor as above. However a look at the values for the coefficients and their errors shows them to be implausible.

Edit @JMarcelino: If you look at the warning message from the first coxph model in the example, you'll see the warning message:

2: In coxph(Surv(t1, s1) ~ te1 * pa1, data = df1) :
  X matrix deemed to be singular; variable 3

Which is likely the same error you're getting and is due to this problem of classification. Also, your third cross table xtabs(~ tecnologia+pais, data=dados) is not as important as the table of status by interaction term. You could add the interaction term manually first as in the example above then check the cross table. Or you could say:

> with(df1,
       table(s1, pa1te1=pa1*te1))
   pa1te1
s1  0
  0 3
  1 3

That said, I notice one of the cells in your third table has a zero (conv, PT) meaning you have no observations with this combination of predictors. This is going to cause problems when trying to fit.

In general, the outcome should be have some values for all levels of the predictors and the predictors should not classify the outcome as exactly all or nothing or 50/50.

Edit 2 @user75782131 Yes, generally speaking xtabs or a similar cross-table should be performed in models where the outcome and predictors are discrete i.e. have a limited no. of levels. If 'perfect classification' is present then a predictive model / regression may not be appropriate. This is true for example for logistic regression (outcome is binary) as well as Cox's model.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!