read.csv blank fields to NA

断了今生、忘了曾经 提交于 2019-11-26 20:23:17

问题


I have tab delimited text file, named 'a.txt'. The D column is empty.

 A       B       C    D
10      20     NaN
30              40
40      30      20
20      NA      20

I want to have the dataframe looking and acting exactly as the text file, with a space in the 2nd row and in the 2nd column.

Unfortunately, read.csv is converting all the blanks and NA to "NA". I want to read NA and NaN as characters.

 b<- read.csv("a.txt",sep="\t", skip =0, header = TRUE, comment.char = "",check.names = FALSE, quote="", )

To summarize: I want to replicate the same values in output file without modifying them:

  • If there is a blank in input, the output should be blank.
  • If the input has NA or Nan, then the output should also have NA or NaN.

回答1:


After reading the csv file, try the following. It will replace the NA values with "".

b[is.na(b)]<-""

Fairly certain that won't fix your NaN values. That will need to be resolved in a separate statement

b[is.nan(b)]<-""



回答2:


The default for na.strings is just "NA", so you perhaps need to add "NaN". True blanks ("") are set to missing but spaces (" ") are not:

 b<- read.csv("a.txt",  skip =0,  
               comment.char = "",check.names = FALSE, quote="",
               na.strings=c("NA","NaN", " ") )

It's not clear that this is the problem since your data example is malformed and does not have commas. That may be the fundamental problem since read.csv does not allow tab-separation. Use read.delim or read.table if your data has tab-separation.

b<- read.table("a.txt", sep="\t" skip =0, header = TRUE, 
               comment.char = "",check.names = FALSE, quote="",
               na.strings=c("NA","NaN", " ") )

# worked example for csv text file connection
 bt <- "A,B,C  
10,20,NaN
30,,40
40,30,20
,NA,20"

 b<- read.csv(text=bt, sep=",", 
                comment.char = "",check.names = FALSE, quote="\"",
                na.strings=c("NA","NaN", " ") )
 b
#--------------
   A  B  C
1 10 20 NA
2 30 NA 40
3 40 30 20
4 NA NA 20

Example 2:

bt <- "A,B,C,D
10,20,NaN
30,,40
40,30,20
,NA,20"

 b<- read.csv(text=bt, sep=",", 
                comment.char = "",check.names = FALSE, quote="\"",
                na.strings=c("NA","NaN", " ") , colClasses=c(rep("numeric", 3), "logical")) 
 b
#----------------
   A  B  C  D
1 10 20 NA NA
2 30 NA 40 NA
3 40 30 20 NA
4 NA NA 20 NA
> str(b)
'data.frame':   4 obs. of  4 variables:
 $ A: num  10 30 40 NA
 $ B: num  20 NA 30 NA
 $ C: num  NA 40 20 20
 $ D: logi  NA NA NA NA

It's mildly interesting that NA and NaN are not identical for numeric vectors. NaN is returned by operations that have no mathematical meaning (but as noted in the help page you get with ?NaN, the results of operations may depend on the particular OS. Tests of equality are not appropriate for either NaN or NA. There are specific is functions for them:

> Inf*0
[1] NaN

> is.nan(c(1,2.2,3,NaN, NA) )
[1] FALSE FALSE FALSE  TRUE FALSE
> is.na(c(1,2.2,3,NaN, NA) )
[1] FALSE FALSE FALSE  TRUE  TRUE  # note the difference



回答3:


You can specify colClasses in the read.csv statement to read the column as text.




回答4:


Use the na.string argument.
na.string is used to define what arguments are to be read as na value from the data. So if you mention

read.csv(text=bt, na.string = "abc")

then where ever in your data it will abc it will convert it into na.
Since abc is not found in your data it won't convert any value into na.



来源:https://stackoverflow.com/questions/19125899/read-csv-blank-fields-to-na

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!