How Can I Make Sure All My .CSV Data Gets Imported as NA instead of Blank in R?

為{幸葍}努か 提交于 2019-12-12 01:36:12

问题


In my dataset, I'm using have four assessments I'm trying to predict: 1 [Good] to 4 [Bad].

My model seems to be working using the polr function to predict values using ordered logistic regression -- though it's giving me the 'warning message': In cbind(race, partisanship, sex, age) : number of rows of result is not a multiple of vector length (arg 4), because there are some cells that I can see got imported as blanks instead of NAs.

Here's what the output looks like:

mydata <- read.csv("~/Desktop/R/mydata.csv")
attach(mydata)    
> y <- as.factor(assessment)
> x <- cbind(race, partisanship, sex, age)
Warning message:
In cbind(race, partisanship, sex, age) :
  number of rows of result is not a multiple of vector length (arg 4)
> 
> olr <- polr(y ~ x, mydata)
> summary(olr)

Re-fitting to get Hessian

Call:
polr(formula = y ~ x, data = mydata)

Coefficients:
                 Value Std. Error t value
xrace          0.49485   0.214426  2.3078
xpartisanship -0.00990   0.002942 -3.3654
xsex          -0.21304   0.299763 -0.7107
xage           0.01486   0.006812  2.1819

Intercepts:
    Value   Std. Error t value
1|2 -1.4763  0.8253    -1.7887
2|3  1.8049  0.8237     2.1913
3|4  2.4739  0.8290     2.9842

Residual Deviance: 667.1306 
AIC: 681.1306 
(1401 observations deleted due to missingness)

I tried to combat the problem adding na.strings = "" and x[x==""] <- NA before I define x-- it looks better in the summary output -- but I still get the error.

It's the race column that for some reason imports missing cells as blanks instead of NAs, because when I look at the .csv file using view(mydata) in R-Studio, I see blanks instead of NAs in the race column, while all the other columns have NAs where I'm missing data. Though when I look at the output, it shows NAs.

For example, in R-Studio, row 7 shows a NA for partisanship already, but row 10 shows a blank for race:

> head(x, 10)
      race partisanship age
 [1,]    2         97.4  80
 [2,]    2         96.7  75
 [3,]    3         95.0  70
 [4,]    3         87.7  65
 [5,]    3         85.2  60
 [6,]    3          4.7  50
 [7,]    3           NA  40
 [8,]    3          9.1  30
 [9,]    3          1.1  80
[10,]   NA         10.2  75

Does anybody have any ideas on how I can removing this error? And a way to import all .csv files with NAs so I know everything's lining up properly?

EDIT: If it helps, after doing a bit more research, it looks like the columns with missing values showing up as blanks instead of NAs stems from manual editing of the data to clean it up before loading it into R. Most of the data I have to import requires a bit of clean-up first, so I don't know how to get around doing this.

Thanks!


回答1:


It's getting to be a long string of comments, so let me put it into an answer.

It appears, from the cbind error, that age, sex, partisanship, and race are not the same length. This is a serious error. It means that somewhere in your data, the link between age[n], sex[n], partisanship[n], and race[n] has been broken.

This might be the result of doing an na.omit on one or more of the vectors. NA's should be there when you don't know an answer. If you know all the ages, sex's, partisanship, and race of all participants except for the age of participant 12, you need an NA in age[12] so that everything lines up. If you remove the NA, what's in age[13] ends up in age[12] and so matches up with sex[12], partisanship[12], and race[12] instead of with sex[13], partisanship[13], and race[13]. If age was originally, say, 42 long, age[42] will not have any value and R is warning you that it forced things to work by wrapping around and assigning age[42] = age[1].

Does that make sense?

So you need to figure out how the vectors became different lengths in the first place.



来源:https://stackoverflow.com/questions/23145430/how-can-i-make-sure-all-my-csv-data-gets-imported-as-na-instead-of-blank-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!