google-cloud-datalab

Google datalab : how to import pickle

牧云@^-^@ 提交于 2019-12-06 05:23:58
Is it possible in Google Datalab to read pickle/joblib models from Google Storage using %%storage clause? This question relates to Is text the only content type for %%storage magic function in datalab Run the following code in an otherwise empty cell: %%storage read --object <path-to-gcs-bucket>/my_pickle_file.pkl --variable test_pickle_var Then run following code: from io import BytesIO pickle.load(BytesIO(test_pickle_var)) I used the code below to upload a pandas DataFrame to Google Cloud Storage as a pickled file and read it back: from datalab.context import Context import datalab.storage

Reading batches of data from BigQuery into Datalab

耗尽温柔 提交于 2019-12-05 22:13:51
I have a big dataset in BigQuery table (~45M lines, 13Gb of data). I would like to process that data in my Google Datalab Notebook to do some basic statistics with pandas to visualise data later with matplotlib in Datalab cell. I think it is not a good idea to try to load all dataset into pandas' Dataframe (at least I will have RAM issues). Is it possible to read data from BigQuery in batches (say 10K lines) to consume it in Datalab? Thanks in advance! If your purpose is to visualize the data, would sampling be better than loading a small batch? You can sample your data such as: import google

Loading multiple files from Google Cloud Storage into a single Pandas Dataframe

倖福魔咒の 提交于 2019-12-05 20:55:46
I have been trying to write a function that loads multiple files from a Google Cloud Storage bucket into a single Pandas Dataframe, however I cannot seem to make it work. import pandas as pd from google.datalab import storage from io import BytesIO def gcs_loader(bucket_name, prefix): bucket = storage.Bucket(bucket_name) df = pd.DataFrame() for shard in bucket.objects(prefix=prefix): fp = shard.uri %gcs read -o $fp -v tmp df.append(read_csv(BytesIO(tmp)) return df When I try to run it says: undefined variable referenced in command line: $fp Sure, here's an example: https://colab.research

How to add 'private' python module to Google Datalab

烂漫一生 提交于 2019-12-04 14:38:49
I'm experimenting with the promising Google Cloud Datalab. In the past I've created some handy python classes and functions that I'd like to use in the GCD-notebooks but I don't know how to add my code. Anybody any suggestions? This is, on second thought, obvious: %%bash pip install git+http://myawsomepythonmodule.git 来源: https://stackoverflow.com/questions/33165443/how-to-add-private-python-module-to-google-datalab

Google Colaboratory vs Google Datalab. How are they different?

本小妞迷上赌 提交于 2019-12-04 08:06:30
问题 I understand both are built over Jupyter noteboooks but run in cloud. Why do we have two then? 回答1: Jupyter is the only thing these two services have in common. Colaboratory is a tool for education and research. It doesn’t require any setup or other Google products to be used (although notebooks are stored in Google Drive). It’s intended primarily for interactive use and long-running background computations may be stopped. It currently only supports Python. Cloud Datalab allows you to analyse

How to correctly stop Google Cloud Datalab

ε祈祈猫儿з 提交于 2019-12-03 13:44:48
问题 Playing with data is a joy in Junyper/Datalab, but I do not want it to become costly. Google recommends: "You can minimize compute charges by stopping/restarting Cloud Datalab instances." However, if I stop the AppEngine instance or the Compute Engine VM instance, they simply restart ... So how to correctly stop/pause Google Cloud Datalab, so that I'm only charged for my use, not for the idle time? Is there some kind of trigger that restarts the instances? 回答1: Here's what I'm doing. I like

Google Colaboratory vs Google Datalab. How are they different?

非 Y 不嫁゛ 提交于 2019-12-02 20:52:29
I understand both are built over Jupyter noteboooks but run in cloud. Why do we have two then? Jupyter is the only thing these two services have in common. Colaboratory is a tool for education and research. It doesn’t require any setup or other Google products to be used (although notebooks are stored in Google Drive). It’s intended primarily for interactive use and long-running background computations may be stopped. It currently only supports Python. Cloud Datalab allows you to analyse data using Google Cloud resources. You can take full advantage of scalable services such as BigQuery and

How to use google cloud storage in dataflow pipeline run from datalab

左心房为你撑大大i 提交于 2019-12-02 06:02:18
问题 We've been running a Python pipeline in datalab that reads image files from a bucket in google cloud storage (importing google.datalab.storage). Originally we were using DirectRunner and this worked fine, but now we're trying to use DataflowRunner, and we're having import errors. Even if we include "import google.datalab.storage" or any variant thereof inside the function run by the pipeline, we get errors such as "No module named 'datalab.storage'". We've also tried using the save_main

Can't deploy Google Cloud Datalab - Application in non-US zone

自古美人都是妖i 提交于 2019-12-02 03:59:17
I've selected my Google API project 4 times now and pushed "Deploy DataLab" , but whenever I check back I have no Datalab project. The last time I had the following error message, but I have billing enabled, am the owner and the BigQuery/Google Compute Engine APIs are activated. Checking the logs it says I'm in the wrong region: Oct 13 19:42:35 datalab-deploy-main-20151013-19-40-34 startupscript: Pushing tag for rev [b886390e822d] on {https://gcr.io/v1/repositories/_m_sdk/mark-edmondson-gde.datalab.main/tags/latest} Oct 13 19:42:36 datalab-deploy-main-20151013-19-40-34 startupscript: 07:42 PM

How to use google cloud storage in dataflow pipeline run from datalab

橙三吉。 提交于 2019-12-02 00:55:09
We've been running a Python pipeline in datalab that reads image files from a bucket in google cloud storage (importing google.datalab.storage). Originally we were using DirectRunner and this worked fine, but now we're trying to use DataflowRunner, and we're having import errors. Even if we include "import google.datalab.storage" or any variant thereof inside the function run by the pipeline, we get errors such as "No module named 'datalab.storage'". We've also tried using the save_main_session, requirements_file, and setup_file flags with no luck. How would we correctly access image files in