Environment:
Every time I shut down a IPython notebook and re-open it, I have to re-run all the cells. But some cells
Unfortunately, it doesn't seem like there is something as convenient as an automatic cache. The %store
magic option is close, but requires you to do the caching and reloading manually and explicitly.
a = 1
%store a
Now, let's say you close the notebook and the kernel gets restarted. You no longer have access to the local variables. However, you can reload the variables you've stored using the -r
option.
%store -r a
print a # Should print 1
Use the cache magic.
%cache myVar = someSlowCalculation(some, "parameters")
This will calculate someSlowCalculation(some, "parameters") once. And in subsequent calls it restores myVar from storage.
https://pypi.org/project/ipython-cache/
Under the hood it does pretty much the same as the accepted answer.
In fact the functionality you ask is already there, no need to re-implement it manually by doing your dumps .
You can use the use the %store or maybe better the %%cache magic (extension) to store the results of these intermittently cells, so they don't have to be recomputed (see https://github.com/rossant/ipycache)
It is as simple as:
%load_ext ipycache
Then, in a cell e.g.:
%%cache mycache.pkl var1 var2
var1 = 1
var2 = 2
When you execute this cell the first time, the code is executed, and the variables var1 and var2 are saved in mycache.pkl in the current directory along with the outputs. Rich display outputs are only saved if you use the development version of IPython. When you execute this cell again, the code is skipped, the variables are loaded from the file and injected into the namespace, and the outputs are restored in the notebook.
It saves all graphics, output produced, and all the variables specified automatically for you :)
Can you give an example of what you are trying to do? When I run something in an IPython Notebook that is expensive I almost always write it to disk afterword. For example, if my data is a list of JSON object, I write it to disk as line separated JSON formatted strings:
with open('path_to_file.json', 'a') as file:
for item in data:
line = json.dumps(item)
file.write(line + '\n')
You can then read back in the data the same way:
data = []
with open('path_to_file.json', 'a') as file:
for line in file:
data_item = json.loads(line)
data.append(data_item)
I think this is a good practice generally speaking because it provides you a backup. You can also use pickle for the same thing. If your data is really big you can actually gzip.open
to directly write to a zip file.
EDIT
To save a scikit learn model to disk use joblib.pickle
.
from sklearn.cluster import KMeans
km = KMeans(n_clusters=num_clusters)
km.fit(some_data)
from sklearn.externals import joblib
# dump to pickle
joblib.dump(km, 'model.pkl')
# and reload from pickle
km = joblib.load('model.pkl')