Using the ff package of R, I imported a csv file into a ffdf object, but was surprised to find that the object occupied some 700MB of RAM. Isn\'t ff supposed to keep data on
I had the same problem, and posted a question, and there is a possible explanation for your issue. When you read a file, character rows are treated as factors, and if there is a lot of unique levels, they will go into RAM. ff seems to load always factor levels into RAM. See this answer from jwijffels in my question:
Loading ffdf data take a lot of memory
best, miguel.
The ff
package uses memory mapping to just load parts of the data into memory as needed.
But it seems that by calling object.size
, you actually force loading the whole thing into memory! That's what the warning messages seem to indicate...
So don't do that... Use Task Manager (Windows) or the top command (Linux) to see how much memory the R process actually uses before and after you've loaded the data.
You need to provide the data in chunks to biglm, see ?biglm. If you pass a ffdf object instead of a data.frame, you run into one of the following two problems:
Check ?chunk.ffdf for an example of how to pass chunks from ffdf to biglm.