Persisting data in Google Colaboratory

后端未结

关注

 7  596

眼角桃花

Has anyone figured out a way to keep files persisted across sessions in Google\'s newly open sourced Colaboratory?

Using the sample notebooks, I\'m successfully authe

相关标签:

7条回答

悲哀的现实

2020-12-28 14:42

If anyone's interested in saving and restoring the whole session, here's a snippet I'm using that you might find useful:

import os
import dill
from google.colab import drive

backup_dir = 'drive/My Drive/colab_sessions'
backup_file = 'notebook_env.db'
backup_path = backup_dir + '/' + backup_file

def init_drive():
  # create directory if not exist
  drive.mount('drive')
  if not os.path.exists(backup_dir):
    !mkdir backup_dir

def restart_kernel():
  os._exit(00)

def save_session():
  init_drive()
  dill.dump_session(backup_path)

def load_session():
  init_drive()
  dill.load_session(backup_path)

Edit: Works fine until your session size is not too big. You need to check if it works for you..

0 讨论(0)

面向向阳花

2020-12-28 14:50

As you pointed out, Google Colaboratory's file system is ephemeral. There are workarounds, though there's a network latency penalty and code overhead: e.g. you can use boilerplate code in your notebooks to mount external file systems like GDrive (see their example notebook).

Alternatively, while this is not supported in Colaboratory, other Jupyter hosting services – like Jupyo – provision dedicated VMs with persistent file systems so the data and the notebooks persist across sessions.

0 讨论(0)
发布评论:

提交评论
- 加载中...
无人共我

2020-12-28 14:51
I was interested in importing a module in a separate .py file.

What I ended up doing is copying the .py file contents to the first cell in my notebook, adding the following text as the first line:
```
%%writefile mymodule.py
```
This creates a separate file named mymodule.py in the working directory so your notebook can use it with an import line.

I know that by running all of the code in the module would enable using the variables and functions in the notebook, but my code required importing a module, so that was good enough for me.
0 讨论(0)
发布评论:

提交评论
- 加载中...
花落未央

2020-12-28 14:53

Put that before your code, so will always download your file before run your code

!wget -q http://www.yoursite.com/file.csv

0 讨论(0)
发布评论:

提交评论
- 加载中...
甜味超标

2020-12-28 15:00

Your interpretation is correct. VMs are ephemeral and recycled after periods of inactivity. There's no mechanism for persistent data on the VM itself right now.

In order for data to persist, you'll need to store it somewhere outside of the VM, e.g., Drive, GCS, or any other cloud hosting provider.

Some recipes for loading and saving data from external sources is available in the I/O example notebook.

0 讨论(0)
发布评论:

提交评论
- 加载中...
逝去的感伤

2020-12-28 15:05

Clouderizer may provide some data persistence, at the cost of a long setup(because you use google colab only as a host) and little space to work on.

But, in my opinion that's best than have your file(s) "recycled" when you forget to save your progress.

0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页