Plotting of very large data sets in R

前端 未结 8 1405
陌清茗
陌清茗 2021-01-31 04:16

How can I plot a very large data set in R?

I\'d like to use a boxplot, or violin plot, or similar. All the data cannot be fit in memory. Can I incrementally read in and

8条回答
  •  日久生厌
    2021-01-31 04:29

    You could make plots from manageable sample of your data. E.g. if you use only 10% randomly chosen rows then boxplot on this sample shouldn't differ from all-data boxplot.

    If your data are on some database there you be able to create some random flag (as I know almost every database engine has some kind of random number generator).

    Second thing is how large is your dataset? For boxplot you need two columns: value variable and group variable. This example:

    N <- 1e6
    x <- rnorm(N)
    b <- sapply(1:100, function(i) paste(sample(letters,40,TRUE),collapse=""))
    g <- factor(sample(b,N,TRUE))
    boxplot(x~g)
    

    needs 100MB of RAM. If N=1e7 then it uses <1GB of RAM (which is still manageable to modern machine).

提交回复
热议问题