问题
I've spent the whole of today first battling with formatting my data (updated after finding a bug via BondedDust's table(TM) suggestion) appropriately for mLogit:
raw <-read.csv("C:\\Users\\Andy\\Desktop\\research\\Oxford\\Prefs\\rData.csv", header=T, row.names = NULL,id="id")
raw <-na.omit(raw)
library(mlogit)
TM <- mlogit.data(raw, choice = "selected", shape = "long", alt.var = "dishId", chid.var = "individuals", drop.index = TRUE)
Where I fail is when trying to model my data.
model <- mlogit(selected ~ food + plate | sex + age +hand, data = TM)
Error in solve.default(H, g[!fixed]) : system is computationally singular: reciprocal condition number = 6.26659e-18
I would really appreicate some help on the topic. Afraid I'm going a little bananas with it.
The data itself is from an experiment where we get 1000s of people to decide between pairs of plates of food (we vary how the food looks - either Angular or Circular - and vary how the plate is shaped - is either Angular or Circular).
With best wishes, Andy.
PS Afraid I'm a newbie with statistic Qs on StackOverflow.
回答1:
The model is unable to interpret your dishId as the alternative index (alt.var
) because you have different keypairs for different choices. For example, you have "TS" and "RS" as alternative index keys for the first choice in your .csv file but you have "RR" and "RS" as keys for choice 3634. Additionally, you did also not specify the names of the alternatives (alt.levels
). As a result of the fact that alt.levels
is not filled in, mlogit.data
will automatically try to detect the alternatives based upon the alternative index, which it cannot correctly interpret. This is basically where everything goes wrong: The 'food' and 'plate' variables are not interpreted as alternatives but they are considered as individual specific variables that eventually end up causing singularity issues.
You have two options to fix the issue. You can give the actual alternatives as input to mlogit.data
through the alt.levels
parameter:
TM <- mlogit.data(raw, choice = "selected", shape = "long", alt.levels = c("food","plate"),chid.var = "individuals",drop.index=TRUE)
model1 <- mlogit(selected ~ food + plate | sex + age +hand, data = TM)
Alternatively, you could opt to make your index keys consistent so that you can give them as input via alt.var
. mlogit.data
will now be able to correctly guess what your alternatives are:
raw[,3] <- rep(1:2,nrow(raw)/2) # use 1 and 2 as unique alternative keys for all choices
TM <- mlogit.data(raw, choice = "selected", shape = "long", alt.var="dishId", chid.var = "individuals")
model2 <- model <- mlogit(selected ~ food + plate | sex + age +hand, data = TM)
We verify that both models are indeed identical. The results of model 1:
> summary(model1)
Call:
mlogit(formula = selected ~ food + plate | sex + age + hand,
data = TM, method = "nr", print.level = 0)
Frequencies of alternatives:
food plate
0.42847 0.57153
nr method
4 iterations, 0h:0m:0s
g'(-H)^-1g = 0.00423
successive function values within tolerance limits
Coefficients :
Estimate Std. Error t-value Pr(>|t|)
plate:(intercept) -0.0969627 0.0764117 -1.2689 0.2044589
foodCirc 1.0374881 0.0339559 30.5540 < 2.2e-16 ***
plateCirc -0.0064866 0.0524547 -0.1237 0.9015835
plate:sexmale -0.0811157 0.0416113 -1.9494 0.0512512 .
plate:age16-34 0.1622542 0.0469167 3.4583 0.0005435 ***
plate:age35-54 0.0312484 0.0555634 0.5624 0.5738492
plate:age55-74 0.0556696 0.0836248 0.6657 0.5055987
plate:age75+ 0.1057646 0.2453797 0.4310 0.6664508
plate:handright -0.0177260 0.0539510 -0.3286 0.7424902
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Log-Likelihood: -8284.6
McFadden R^2: 0.097398
Likelihood ratio test : chisq = 1787.9 (p.value = < 2.22e-16)
Versus the results of model 2. Note that the alternatives are correctly identified, but the names are not explicitly added to the model:
> summary(model2)
Call:
mlogit(formula = selected ~ food + plate | sex + age + hand,
data = TM, method = "nr", print.level = 0)
Frequencies of alternatives:
1 2
0.42847 0.57153
nr method
4 iterations, 0h:0m:0s
g'(-H)^-1g = 0.00423
successive function values within tolerance limits
Coefficients :
Estimate Std. Error t-value Pr(>|t|)
2:(intercept) -0.0969627 0.0764117 -1.2689 0.2044589
foodCirc 1.0374881 0.0339559 30.5540 < 2.2e-16 ***
plateCirc -0.0064866 0.0524547 -0.1237 0.9015835
2:sexmale -0.0811157 0.0416113 -1.9494 0.0512512 .
2:age16-34 0.1622542 0.0469167 3.4583 0.0005435 ***
2:age35-54 0.0312484 0.0555634 0.5624 0.5738492
2:age55-74 0.0556696 0.0836248 0.6657 0.5055987
2:age75+ 0.1057646 0.2453797 0.4310 0.6664508
2:handright -0.0177260 0.0539510 -0.3286 0.7424902
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Log-Likelihood: -8284.6
McFadden R^2: 0.097398
Likelihood ratio test : chisq = 1787.9 (p.value = < 2.22e-16)
回答2:
This is more a comment than an answer (I don't have anough rep point to comment!). However, I wasn't able to reproduce your code as there isn't any age
column in your rData.csv
.
来源:https://stackoverflow.com/questions/29849640/r-mlogit-model-computationally-singular