Why does ff still store data in RAM?

后端 未结 3 1988
广开言路
广开言路 2021-01-19 06:48

Using the ff package of R, I imported a csv file into a ffdf object, but was surprised to find that the object occupied some 700MB of RAM. Isn\'t ff supposed to keep data on

相关标签:
3条回答
  • 2021-01-19 06:59

    I had the same problem, and posted a question, and there is a possible explanation for your issue. When you read a file, character rows are treated as factors, and if there is a lot of unique levels, they will go into RAM. ff seems to load always factor levels into RAM. See this answer from jwijffels in my question:

    Loading ffdf data take a lot of memory

    best, miguel.

    0 讨论(0)
  • 2021-01-19 07:02

    The ff package uses memory mapping to just load parts of the data into memory as needed.

    But it seems that by calling object.size, you actually force loading the whole thing into memory! That's what the warning messages seem to indicate...

    So don't do that... Use Task Manager (Windows) or the top command (Linux) to see how much memory the R process actually uses before and after you've loaded the data.

    0 讨论(0)
  • 2021-01-19 07:10

    You need to provide the data in chunks to biglm, see ?biglm. If you pass a ffdf object instead of a data.frame, you run into one of the following two problems:

    1. ffdf is not a data.frame, so something undefined happens
    2. the function to which you passed tries to convert ffdf to data.frame by e.g. as.data.frame(ffdf), which easily exhausts your RAM, this likely is what happend to you

    Check ?chunk.ffdf for an example of how to pass chunks from ffdf to biglm.

    0 讨论(0)
提交回复
热议问题