I can't read in data to R

后端 未结 3 752
猫巷女王i
猫巷女王i 2021-01-07 07:32

I am trying to read in some data that is is a text file that looks like this:

2009-08-09 - 2009-08-15 0   2   0
2009-08-16 - 2009-08-22 0   1   0
2009-08-23          


        
相关标签:
3条回答
  • 2021-01-07 08:11

    The file you are reading is probably using some encoding other than ASCII. ?read.table shows

     read.table(file, header = FALSE, sep = "", quote = "\"'",
                ... 
                fileEncoding = "", encoding = "unknown")
    
    fileEncoding: character string: if non-empty declares the encoding used
              on a file (not a connection) so the character data can be
              re-encoded.  See 'file'. 
    

    So perhaps try setting the fileEncoding parameter. If you don't know the encoding, perhaps try "utf-8" or "cp-1252". If that does not work, then if you pastebin a snippet of your actual file, we may be able to identify the encoding.

    0 讨论(0)
  • 2021-01-07 08:18

    What you see here:

    ÿþ
    

    is the Byte Order Mark (BOM) for UTF-16-LE or UCS-2LE. See Wikipedia (Byte Order Mark) for an explanation. You might have characters from strange languages in your file that need this encoding, or your file might have been created by some Windows software that saves files with a BOM. The BOM is placed before all other data at the beginning of a file.

    R sees these characters and believes the data start here. Try:

    (1) If you don't need this encoding, simply open your data in a text editor (like Vim), change the encoding, save, and read into R. (In Vim do :write ++enc=utf-8 new_file_name.txt, then close the file and open the newly saved version, then do :set nobomb, just to be sure, then :wq.)

    (2) If you need the encoding or don't want to go through a text editor, tell R what encoding the file is in. You might experiment with:

    read.table("file.dat", fileEncoding = "UTF-16")
    read.table("file.dat", fileEncoding = "UTF-16LE")
    read.table("file.dat", fileEncoding = "UTF-16-LE")
    read.table("file.dat", fileEncoding = "UCS-2LE")
    

    If none of these work, try the solution given in this related question: How to detect the right encoding for read.csv?, and check the R manual on R Data Import/Export, it has a section that explains about files with BOM.

    0 讨论(0)
  • 2021-01-07 08:24

    Your separator could be spaces rather than tabs. If you leave the sep argument as "", it will use any kind of white space.

    EDIT: Actually, the encoding does sound more likely as the source of the problem.

    Read in the file with readLines, then check the encoding with Encoding.

    0 讨论(0)
提交回复
热议问题