Fast method to read csv with UTF-16LE encoding

泄露秘密 提交于 2019-12-12 12:27:17

问题


I'm dealing with .csv files with UTF-16LE encoding, this method works to read the files, but read.csv is very slow compared to read_csv.

  read.csv2(path,dec=",",skip=1,header=T,fileEncoding="UTF-16LE",sep="/t")

Unfortunately I can't make read_csv work, I only get empty rows and I don't find a way to even specify encoding in the function.

I can't share my data, but if anyone dealt with this encoding any help would be appreciated.


回答1:


You can specify file encodings with readr functions like read_csv with the locale option: locale=locale(encoding="UTF-16LE"). However, I haven't successfully read in a utf-16le file with read_csv. I get an "Incomplete multibyte sequence" error. There's a related issue filed, but I still have issues with my file -- hopefully others will have more success.




回答2:


You can try to fread, from data.table package. fread is faster than read_csv. The code can be something like below.

library(data.table)
fread(path, fileEncoding="UTF-16LE")

Hope this helps.



来源:https://stackoverflow.com/questions/36862340/fast-method-to-read-csv-with-utf-16le-encoding

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!