Using Kaggle Datasets in Google Colab

前端 未结 8 1711
南笙
南笙 2020-11-30 22:11

Is it possible to use any datasets available via the kaggle API in Google Colab? I see the Kaggle API is used in this Colab notebook, but it\'s a bit unclear to

相关标签:
8条回答
  • 2020-11-30 22:44

    Step-by-step --

    1. Create an API key in Kaggle.

      To do this, go to kaggle.com/ and open your user settings page.

    2. Next, scroll down to the API access section and click generate to download an API key. This will download a file called kaggle.json to your computer. You'll use this file in Colab to access Kaggle datasets and competitions.

    3. Navigate to https://colab.research.google.com/.

    4. Upload your kaggle.json file using the following snippet in a code cell:

      from google.colab import files files.upload()

    5. Install the kaggle API using !pip install -q kaggle

    6. Move the kaggle.json file into ~/.kaggle, which is where the API client expects your token to be located:

      !mkdir -p ~/.kaggle !cp kaggle.json ~/.kaggle/

    7. Now you can access datasets using the client, e.g., !kaggle datasets list.

    Here's a complete example notebook of the Colab portion of this process: https://colab.research.google.com/drive/1DofKEdQYaXmDWBzuResXWWvxhLgDeVyl

    This example shows uploading the kaggle.json file, the Kaggle API client, and using the Kaggle client to download a dataset.

    0 讨论(0)
  • 2020-11-30 22:44

    First of all, run this command to find out where this colab file exists, how it executes. !ls -d $PWD/* It will show /content/data /content/gdrive /content/models In other words, your current directory is root/content/. Your working directory(pwd) is /content/. so when you do !ls, it will show data gdrive models. FYI, ! allows you to run linux commands inside colab.

    Google Drive keeps cleaning up the /content folder. Therefore, every session you use colab, downloaded data sets, kaggle json file will be gone. That's why it's important to automate the process, so you can focus on writing code, not setting up the environment every time.

    Run this in colab code block as an example with your own api key. open kaggle.json file. you will find them out.

    # Info on how to get your api key (kaggle.json) here: https://github.com/Kaggle/kaggle-api#api-credentials
    !pip install kaggle
    {"username":"seunghunsunmoonlee","key":""}
    import json
    import zipfile
    import os
    with open('/content/.kaggle/kaggle.json', 'w') as file:
        json.dump(api_token, file)
    !chmod 600 /content/.kaggle/kaggle.json
    !kaggle config path -p /content
    !kaggle competitions download -c dog-breed-identification
    os.chdir('/content/competitions/dog-breed-identification')
    for file in os.listdir():
        zip_ref = zipfile.ZipFile(file, 'r')
        zip_ref.extractall()
        zip_ref.close()
    

    Then run !ls again. You will see all data you need. Hope it helps!

    0 讨论(0)
提交回复
热议问题