'Embedded nul in string' error when importing csv with fread

后端 未结 6 1205
[愿得一人]
[愿得一人] 2020-11-29 08:56

I have a large file (3.5G) that I\'m trying to import using data.table::fread.

It was originally created from an rpt file that was opened as text and sa

相关标签:
6条回答
  • 2020-11-29 09:38

    A non-technical way to solve this would be, to

    1. Open the problematic .csv

    2. Ctrl+A (Select all)

    3. Open new Excel sheet

    4. Right click and choose 'Paste as values'

    5. Save and use this file in place of original one.

    Worked for me, and doesn't take much time.

    0 讨论(0)
  • 2020-11-29 09:39

    We can remove the null terminators on the command line using something like:

    sed 's/\\0//g' mycsv.csv > mycsv.csv
    

    Or as suggested by @marbel, fread allows you to pass the sed call inside the text. Such as:

    fread("sed 's/\\0//g' mycsv.csv")
    
    0 讨论(0)
  • 2020-11-29 09:46

    You can test this small function:

    cleanFiles<-function(file,newfile){
      writeLines(iconv(readLines(file,skipNul = TRUE)),newfile)
    }
    

    It's work for me

    0 讨论(0)
  • 2020-11-29 09:46

    If you are seeing NUL (x00) characters in an ASCII file you can do this: data.table::fread(text = readLines(pathIn, skipNul = T), ...)

    0 讨论(0)
  • 2020-11-29 09:49

    I ran into a similar error, sharing in case others run into the same issue -

      embedded nul in string: '\0HA\xa8S\001\0\0\0\xd8@\xa8S\001\0\0\0h@\xa8S\001\0\0\0\xf8?\xa8S\001\0\0\0\x88'
    Calls: as.data.table -> fread
    

    The cause of this ended up being different column lengths where my first column (headers) were shorter than the rest.

    0 讨论(0)
  • 2020-11-29 09:53

    In this case, you can use read.csv with fileEncoding of UTF-16LE rather than fread.

    read.csv("mycsv.csv",fileEncoding="UTF-16LE")
    

    Considering your data size, using read.csv would take a couple of minutes, but I think it is not a big deal.

    0 讨论(0)
提交回复
热议问题