I try a regression with R. I have the following code with no problem in importing the CSV file
dat <- read.csv('http://pastebin.com/raw.php?i=EWsLjKNN',sep=";")
dat # OK Works fine
Regdata <- lm(Y~.,na.action=na.omit, data=dat)
summary(Regdata)
However when I try a regression it's not working. I get an error message:
Erreur dans lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
aucun cas ne contient autre chose que des valeurs manquantes (NA)
All my CSV file are numbers and if a "cell" is empty I have the "NA" value. Some column are not empty and some other row are sometimes empty witht the NA value...
So, I don't understand why I get an error message even with :
na.action=na.omit
PS:Data of the CSV are available at: http://pastebin.com/EWsLjKNN
You get this error message because all your data frame rows contain al least one missing value. It can be checked for example with this code:
apply(data,1,function(x) sum(is.na(x)))
[1] 128 126 82 78 73 65 58 34 31 30 28 30 20 21 12 20 17 16 12 42 50 128
So when you run regression wit lm()
and na.action=na.omit
all lines of data frame are removed and there are no data to fit regression.
But this is not the main problem. If your provided data contains all information you have, then you are trying to apply regression with 165 independent variables (X variables) while having only 22 observations. Number of independent variables have to be less than number of observations.
I believe I can add a little clarity to this since I personally experienced this, and that's why I am here-except my issue was with the gls (genearlized least squares model) vs. the standard linaer model. Some like logic "might" apply here-or in a like situation.
I don't refute anything that anyone has said thus far. There might be some confusion with what people percieve as an observation, and the way R percieves these things.
Say you have 160+ independent variables. Say you have a single given source in which all your data comes from. You import it from a file, database, etc. Say you have an identical amount of response variables or something that satisfies R for your purpose of regression analysis.
R will tell you that you have 2 observations. Now, if you have like data obtained in the same exact manner from another source, you have 3 observations if you look in RStudio at your global environment.
The reason I mention this is because the term "observation" in the mathematical sense (as it's being used here) is completely acceptable. In the terms of R, it views an observation in more ways than one.
THAT was a big contributor to a problem I had of like kind-and it told me I had values missing, na.omit this, na.action that, etc. WHen I looked at the OrchardSpray demo, and I reviewed my own methodologies, I figured it out.
The point being is that how we percieve an "observation" in datum is one thing. R has another term for it, and the way it spits out error messages can cause additional confusion.
See what I mean?
来源:https://stackoverflow.com/questions/13958722/r-linear-regression-issue-lm-fitx-y-offset-offset-singular-ok-singular