I can't read in data to R

后端未结

关注

 3  755

I am trying to read in some data that is is a text file that looks like this:

2009-08-09 - 2009-08-15 0   2   0
2009-08-16 - 2009-08-22 0   1   0
2009-08-23


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  孤街浪徒        
                
              
                            
                2021-01-07 08:11
              
            
            
                                                                       
The file you are reading is probably using some encoding other than ASCII.
?read.table shows 

 read.table(file, header = FALSE, sep = "", quote = "\"'",
            ... 
            fileEncoding = "", encoding = "unknown")

fileEncoding: character string: if non-empty declares the encoding used
          on a file (not a connection) so the character data can be
          re-encoded.  See 'file'. 


So perhaps try setting the fileEncoding parameter. If you don't know the encoding, perhaps try "utf-8" or "cp-1252". If that does not work, then if you pastebin a snippet of your actual file, we may be able to identify the encoding.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  慢半拍i        
                
              
                            
                2021-01-07 08:18
              
            
            
                                                                       
What you see here:

ÿþ


is the Byte Order Mark (BOM) for UTF-16-LE or UCS-2LE. See Wikipedia (Byte Order Mark) for an explanation. You might have characters from strange languages in your file that need this encoding, or your file might have been created by some Windows software that saves files with a BOM. The BOM is placed before all other data at the beginning of a file.

R sees these characters and believes the data start here. Try:

(1) If you don't need this encoding, simply open your data in a text editor (like Vim), change the encoding, save, and read into R. (In Vim do :write ++enc=utf-8 new_file_name.txt, then close the file and open the newly saved version, then do :set nobomb, just to be sure, then :wq.)

(2) If you need the encoding or don't want to go through a text editor, tell R what encoding the file is in. You might experiment with:

read.table("file.dat", fileEncoding = "UTF-16")
read.table("file.dat", fileEncoding = "UTF-16LE")
read.table("file.dat", fileEncoding = "UTF-16-LE")
read.table("file.dat", fileEncoding = "UCS-2LE")


If none of these work, try the solution given in this related question: How to detect the right encoding for read.csv?, and check the R manual on R Data Import/Export, it has a section that explains about files with BOM.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  名媛妹妹        
                
              
                            
                2021-01-07 08:24
              
            
            
                                                                       
Your separator could be spaces rather than tabs.  If you leave the sep argument as "", it will use any kind of white space.

EDIT: Actually, the encoding does sound more likely as the source of the problem.

Read in the file with readLines, then check the encoding with Encoding. 
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复