Reading big data in R by read.big.matrix

五迷三道 提交于 2019-12-08 02:13:28

问题


I am reading a data of dimension 3131875*5 in r using read.big.matrix. My data has both character and numeric columns including date variable. The command which I should use is

as1 <- read.big.matrix("C:/Documents and Settings/Arundhati.Mukherjee/My Documents/Arundhati/big data/MB07_Arundhati/sample2.txt",
                       header=TRUE, 
                       backingfile="session.bin",
                       descriptorfile="session.desc",
                       type = NA)

But type = NA is not accepted in R in this case and I am getting an error:

Error in filebacked.big.matrix(nrow = nrow, ncol = ncol, type = type,  : 
  Problem creating filebacked matrix.
In addition: Warning messages:
1: In na.omit(as.integer(firstLineVals)) : NAs introduced by coercion
2: In na.omit(as.double(firstLineVals)) : NAs introduced by coercion
3: In read.big.matrix("C:/Documents and Settings/Arundhati.Mukherjee/My Documents/Arundhati/big data/MB07_Arundhati/sample2.txt",  :
  Because type was not specified, we chose double based on the first line of data.

I need to know what should be the type here. I tried with options like double but that is throwing me same error.

Please help me.


回答1:


From ?read.big.matrix:

Files must contain only one atomic type (all integer, for example).

Therefore, you won't be able to read in data with combinations of character, numeric, integer, date, etc. You could do some work on the file, for instance using a different program to convert the character variables to integer representations (like converting to a factor in R).

EDIT:

On the bigmemory website there's an example of preprocessing data using a python script to change character information to integer. The script is written for a specific dataset, but perhaps you could use it as a guideline for your data.



来源:https://stackoverflow.com/questions/12725603/reading-big-data-in-r-by-read-big-matrix

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!