google-cloud-datalab

Google Cloud Datalab - Create your own Python environment

拈花ヽ惹草 提交于 2019-12-24 10:47:46
问题 The default Google Cloud Datalab only comes with two default kernels: python2 and python3. Is it possible to create our own extra virtual environment? Many thanks, 回答1: Yes, you could modify Datalab docker image and put it into GCE: https://github.com/googledatalab/datalab/wiki/Getting-Started#using-your-modified-datalab-image-on-gce or run it locally: https://github.com/googledatalab/datalab/wiki/Getting-Started#using-datalab-locally 来源: https://stackoverflow.com/questions/51454150/google

How do I install extra python packages on Datalab if they are not supported by pip?

坚强是说给别人听的谎言 提交于 2019-12-24 02:00:55
问题 I tried to install basemap within Datalab using pip: %bash pip install basemap and got the error: Downloading/unpacking basemap Could not find any downloads that satisfy the requirement basemap Cleaning up... No distributions at all found for basemap Storing debug log for failure in /root/.pip/pip.log How do I install extra packages on Datalab if they are not supported by pip? 回答1: Use apt-get install. In a cell of your notebook: %bash apt-get -y update apt-get -y install python-mpltoolkits

How do I share my notebooks in DataLab?

末鹿安然 提交于 2019-12-23 10:39:46
问题 DataLab uses a shared service account but I can't see my team members' notebooks. How do we share notebooks between team members? 回答1: Notebooks are stored in a git repository. If you click the "git Repository" icon on the notebook listing page it will take you to the Cloud Repo page in Google Cloud Dev Console. Pick the datalab_main branch in the dropdown and you will see files ready for commit. Once you commit, other users can click Refresh on the same page in Dev Console (Source | Browse

How can I get the Cloud ML service account programmatically in Python?

余生颓废 提交于 2019-12-20 02:28:17
问题 The Cloud ML instructions show how to obtain the service account using shell commands. How can I do this programmatically in Python? e.g. in Datalab? 回答1: You can use Google Cloud's Python client libraries to issue the getConfig request. from googleapiclient import discovery from googleapiclient import http from oauth2client.client import GoogleCredentials credentials = GoogleCredentials.get_application_default() ml_client = discovery.build( 'ml', 'v1beta1', requestBuilder=http.HttpRequest,

How to read data from Google storage cloud to Google cloud datalab

空扰寡人 提交于 2019-12-20 01:43:30
问题 I have a few CSV files storing in Google storage and I want to read those into Google datalab. So far, I have no idea how to do it. I found this and followed the first answer but didn't work and raised File "<ipython-input-1-5e9607fa3f65>", line 5 %%gcs read --object $data_csv --variable data ^ SyntaxError: invalid syntax Any help will be appreciated. 回答1: If you subtract one of the % symbols it should work. Minimal example: import google.datalab.storage as storage import pandas as pd from io

How can i load my csv from google dataLab to a pandas data frame?

半腔热情 提交于 2019-12-17 20:35:58
问题 Here is what i tried: (ipython notebook, with python2.7) import gcp import gcp.storage as storage import gcp.bigquery as bq import matplotlib.pyplot as plt import pandas as pd import numpy as np sample_bucket_name = gcp.Context.default().project_id + '-datalab' sample_bucket_path = 'gs://' + sample_bucket_name sample_bucket_object = sample_bucket_path + '/myFile.csv' sample_bucket = storage.Bucket(sample_bucket_name) df = bq.Query(sample_bucket_object).to_dataframe() Which fails. would you

Create and replace BigQuery tables

自作多情 提交于 2019-12-13 03:46:31
问题 How do I create and replace an existing BigQuery table? I use datalab to define BigQuery queries and write the results to BigQuery tables. The most efficient way I found to do this is: %%bq query --name helloWorld Select * from someTable Followed by %%bq execute --table schemaName.destination_table --query helloWorld However I have to manually drop the table each time From the command line I can execute something like: bq query --destination_table [PROJECT_ID]:[DATASET].[TABLE] --replace '

is it possible to refer to queries defined in a previous %%sql module in a later module?

拈花ヽ惹草 提交于 2019-12-13 02:33:15
问题 I just started working with the new Google Cloud Datalab and IPython last week (though I've been using BigQuery for a few months). The tutorials and samples in github are very helpful, but as my scripts and queries become more complex I'm wondering a few things. The first one is this: can I refer to queries defined in one %%sql module in a later %%sql module? The other, somewhat related question is can I somehow store the results from one %%sql module and then put that information into

Issue querying a Hive table in Datalab

流过昼夜 提交于 2019-12-12 15:52:20
问题 I have create a dataproc cluster with an updated init action to install datalab. All works fine, except that when I query a Hive table from the Datalab notebook, i run into hc.sql(“””select * from invoices limit 10”””) "java.lang.ClassNotFoundException: Class com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem not found" exception Create cluster gcloud beta dataproc clusters create ds-cluster \ --project my-exercise-project \ --region us-west1 \ --zone us-west1-b \ --bucket dataproc-datalab

gcsfuse on datalab vm machine. Error: fusermount: fuse device not found, try 'modprobe fuse' first

北慕城南 提交于 2019-12-12 15:35:45
问题 I have installed gcsfuse on a datalab machine. Created a target machine and used chmod to allow writing permission to all calling: !gcsfuse --foreground --debug_fuse archs4 /content/datalab/mount/ I am getting the following error: Opening bucket... Mounting file system... mountWithArgs: mountWithConn: Mount: mount: running fusermount: exit status 1 stderr: fusermount: fuse device not found, try 'modprobe fuse' first any idea what might solve that issue? (I am using: gcsfuse version 0.23.0 (Go