read.csv blank fields to NA

前端未结

关注

 4  951

I have tab delimited text file, named \'a.txt\'. The D column is empty.

 A       B       C    D
10      20     NaN
30              40
40      30      20
20


                      
              相关标签:


      
      
        
          4条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  你的背包        
                
              
                            
                2020-12-05 10:51
              
            
            
                                                                       
Use the na.string argument.

na.string is used to define what arguments are to be read as na value from the data. So if you mention 

read.csv(text=bt, na.string = "abc")


then whenever in your data it the value "abc" occurs, then it will convert it into na.

Since "abc" is not found in your data it won't convert any value into na.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  生来不讨喜        
                
              
                            
                2020-12-05 11:01
              
            
            
                                                                       
Late edit: After re-reading this after the edits and extended comments, I'm wondering if what was needed (or asked for, at least) was pretty much the exact opposite of what I advise below. The request for this:


  Unfortunately, read.csv is converting all the blanks and NA to "NA". I want to read NA and NaN as characters.


,,, might have been satisfied (somewhat paradoxically) with the arguments: colClasses="character", stringsAsFactors=FALSE, na.strings="."`

Then any character value including an empty string would come in as itself. Arguing against this is the acceptance of the answer that converts empty character values ("") to R _NA_character values.

Here's a test example with various results:

 sapply(read.csv(text='A\tB\tC\tD\na\t""\tNA\tNaN', sep='\t', na.strings=""), class )
#        A         B         C         D 
# "factor" "logical"  "factor" "numeric" 
 sapply(read.csv(text='A\tB\tC\tD\na\t""\tNA\tNaN', sep='\t', na.strings="x"), class )
#        A         B         C         D 
# "factor" "logical"  "factor" "numeric" 
 sapply(read.csv(text='A\tB\tC\tD\na\t""\tNA\tNaN', sep='\t', na.strings="x", stringsAsFactors=FALSE), class )
#          A           B           C           D 
#"character"   "logical" "character"   "numeric" 

#Almost the expressed desired result
 sapply(read.csv(text='A\tB\tC\tD\na\t""\tNA\tNaN', sep='\t', #colClasses="character", stringsAsFactors=FALSE), class )
#          A           B           C           D 
#"character" "character" "character" "character" 
#But ... still get a real R <NA>
read.csv(text='A\tB\tC\tD\na\t""\tNA\tNaN', sep='\t', colClasses="character", stringsAsFactors=FALSE)
#  A B    C   D
#1 a   <NA> NaN
#So add all three
 read.csv(text='A\tB\tC\tD\na\t""\tNA\tNaN', sep='\t', colClasses="character", stringsAsFactors=FALSE,na.strings=".")
#  A B  C   D
#1 a   NA NaN
# Finally all columns are character and no "real" R NA's




The default for na.strings is just "NA", so you perhaps need to add "NaN". True blanks ("") are set to missing but spaces (" ") are not:

 b<- read.csv("a.txt",  skip =0,  
               comment.char = "",check.names = FALSE, quote="",
               na.strings=c("NA","NaN", " ") )


It's not clear that this is the problem since your data example is malformed and does not have commas. That may be the fundamental problem since read.csv does not allow tab-separation. Use read.delim or read.table if your data has tab-separation.

b<- read.table("a.txt", sep="\t" skip =0, header = TRUE, 
               comment.char = "",check.names = FALSE, quote="",
               na.strings=c("NA","NaN", " ") )

# worked example for csv text file connection
 bt <- "A,B,C  
10,20,NaN
30,,40
40,30,20
,NA,20"

 b<- read.csv(text=bt, sep=",", 
                comment.char = "",check.names = FALSE, quote="\"",
                na.strings=c("NA","NaN", " ") )
 b
#--------------
   A  B  C
1 10 20 NA
2 30 NA 40
3 40 30 20
4 NA NA 20


Example 2:

bt <- "A,B,C,D
10,20,NaN
30,,40
40,30,20
,NA,20"

 b<- read.csv(text=bt, sep=",", 
                comment.char = "",check.names = FALSE, quote="\"",
                na.strings=c("NA","NaN", " ") , colClasses=c(rep("numeric", 3), "logical")) 
 b
#----------------
   A  B  C  D
1 10 20 NA NA
2 30 NA 40 NA
3 40 30 20 NA
4 NA NA 20 NA
> str(b)
'data.frame':   4 obs. of  4 variables:
 $ A: num  10 30 40 NA
 $ B: num  20 NA 30 NA
 $ C: num  NA 40 20 20
 $ D: logi  NA NA NA NA


It's mildly interesting that NA and NaN are not identical for numeric vectors. NaN is returned by operations that have no mathematical meaning (but as noted in the help page you get with ?NaN, the results of operations may depend on the particular OS. Tests of equality are not appropriate for either NaN or NA. There are specific is functions for them:

> Inf*0
[1] NaN

> is.nan(c(1,2.2,3,NaN, NA) )
[1] FALSE FALSE FALSE  TRUE FALSE
> is.na(c(1,2.2,3,NaN, NA) )
[1] FALSE FALSE FALSE  TRUE  TRUE  # note the difference

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  终归单人心        
                
              
                            
                2020-12-05 11:02
              
            
            
                                                                       
After reading the csv file, try the following. It will replace the NA values with "". 

b[is.na(b)]<-""


Fairly certain that won't fix your NaN values. That will need to be resolved in a separate statement

b[is.nan(b)]<-""

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  北海茫月        
                
              
                            
                2020-12-05 11:05
              
            
            
                                                                       
You can specify colClasses in the read.csv statement to read the column as text.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复