Jupyter Lab freezes the computer when out of RAM - how to prevent it?

后端 未结 7 1321
佛祖请我去吃肉
佛祖请我去吃肉 2021-02-04 08:11

I have recently started using Jupyter Lab and my problem is that I work with quite large datasets (usually the dataset itself is approx. 1/4 of my computer RAM). After few trans

7条回答
  •  旧巷少年郎
    2021-02-04 08:29

    There is no reason to view the entire output of a large dataframe. Viewing or manipulating large dataframes will unnecessarily use large amounts of your computer resources.

    Whatever you are doing can be done in miniature. It's far easier working on coding and manipulating data when the data frame is small. The best way to work with big data is to create a new data frame that takes only small portion or a small sample of the large data frame. Then you can explore the data and do your coding on the smaller data frame. Once you have explored the data and get your code working, then just use that code on the larger data frame.

    The easiest way is simply take the first n, number of the first rows from the data frame using the head() function. The head function prints only n, number of rows. You can create a mini data frame by using the head function on the large data frame. Below I chose to select the first 50 rows and pass their value to the small_df. This assumes the BigData is a data file that comes from a library you opened for this project.

    library(namedPackage) 
    
    df <- data.frame(BigData)                #  Assign big data to df
    small_df <- head(df, 50)         #  Assign the first 50 rows to small_df
    

    This will work most of the time, but sometimes the big data frame comes with presorted variables or with variables already grouped. If the big data is like this, then you would need to take a random sample of the rows from the big data. Then use the code that follows:

    df <- data.frame(BigData)
    
    set.seed(1016)                                          # set your own seed
    
    df_small <- df[sample(nrow(df),replace=F,size=.03*nrow(df)),]     # samples 3% rows
    df_small                                                         # much smaller df
    

提交回复
热议问题