问题
It is common in mode choice models to have variables that vary with alternatives ("generic variables") but that are undefined for certain modes. For example, transit fare is present for bus and light rail, but undefined for automobiles and biking. Note that the fare is not zero.
I'm trying to make this work with the mlogit
package for R. In this MWE I've asserted that price
is undefined for fishing from the beach. This results in a singularity error.
library(mlogit)
#> Warning: package 'mlogit' was built under R version 3.5.2
#> Loading required package: Formula
#> Loading required package: zoo
#>
#> Attaching package: 'zoo'
#> The following objects are masked from 'package:base':
#>
#> as.Date, as.Date.numeric
#> Loading required package: lmtest
data("Fishing", package = "mlogit")
Fishing$price.beach <- NA
Fish <- mlogit.data(Fishing, varying = c(2:9), shape = "wide", choice = "mode")
head(Fish)
#> mode income alt price catch chid
#> 1.beach FALSE 7083.332 beach NA 0.0678 1
#> 1.boat FALSE 7083.332 boat 157.930 0.2601 1
#> 1.charter TRUE 7083.332 charter 182.930 0.5391 1
#> 1.pier FALSE 7083.332 pier 157.930 0.0503 1
#> 2.beach FALSE 1250.000 beach NA 0.1049 2
#> 2.boat FALSE 1250.000 boat 10.534 0.1574 2
mlogit(mode ~ catch + price | income, data = Fish, na.action = na.omit)
#> Error in solve.default(H, g[!fixed]): system is computationally singular: reciprocal condition number = 3.92205e-24
Created on 2019-07-08 by the reprex package (v0.2.1)
This happens when price
is moved to the alternative-specific variable position as well. I think the issue may lie in the na.action
function argument, but I can't find any documentation on this argument beyond the basic documentation tag:
na.action: a function which indicates what should happen when the data contains NAs
There appear to be no examples showing how this term is used differently and what the results are. There's a related unanswered question here.
回答1:
There appears to be a few things going on.
I am not quite sure how na.action = na.omit
works under the hood, but it sounds to me like it will drop the entire row. I always find it better to do this explicitly.
When you drop the entire row, you will have choice occasions where no choice was made. This is not going to work. Remember, we are working with logit type probabilities. Furthermore, if no choice is made, no information is gained, so we need to drop these choice observations entirely. Doing these two steps in combination, I am able to run the model you propose.
Here is a commented working example:
library(mlogit)
# Read in the data
data("Fishing", package = "mlogit")
# Set price for the beach option to NA
Fishing$price.beach <- NA
# Scale income
Fishing$income <- Fishing$income / 10000
# Turn into 'mlogit' data
fish <- mlogit.data(Fishing, varying = c(2:9), shape = "wide", choice = "mode")
# Explicitly drop the alts with NA in price
fish <- fish[fish$alt != "beach", ]
# Dropping all NA also means that we now have choice occasions where no choice
# was made and we need to get rid of these as well
fish$choice_made <- rep(colSums(matrix(fish$mode, nrow = 3)), each = 3)
fish <- fish[fish$choice_made == 1, ]
fish <- mlogit.data(fish, shape = "long", alt.var = "alt", choice = "mode")
# Run an MNL model
mnl <- mlogit(mode ~ catch + price | income, data = fish)
summary(mnl)
In general, when working with these models, I find it very useful to always make all data transformations before running a model rather than rely on functions such as na.action
.
来源:https://stackoverflow.com/questions/56939628/handling-alternative-specific-na-values-in-mlogit