Google Cloud Storage + Python : Any way to list obj in certain folder in GCS?

岁酱吖の 提交于 2019-12-30 06:09:40

问题


I'm going to write a Python program to check if a file is in certain folder of my Google Cloud Storage, the basic idea is to get the list of all objects in a folder, a file name list, then check if the file abc.txt is in the file name list.

Now the problem is, it looks Google only provide the one way to get obj list, which is uri.get_bucket(), see below code which is from https://developers.google.com/storage/docs/gspythonlibrary#listing-objects

uri = boto.storage_uri(DOGS_BUCKET, GOOGLE_STORAGE)
for obj in uri.get_bucket():
    print '%s://%s/%s' % (uri.scheme, uri.bucket_name, obj.name)
    print '  "%s"' % obj.get_contents_as_string()

The defect of uri.get_bucket() is, it looks it is getting all of the object first, this is what I don't want, I just need get the obj name list of particular folder(e.g gs//mybucket/abc/myfolder) , which should be much quickly.

Could someone help answer? Appreciate every answer!


回答1:


You may find it easier to work with the JSON API, which has a full-featured Python client. It has a function for listing objects that takes a prefix parameter, which you could use to check for a certain directory and its children in this manner:

from apiclient import discovery

# Auth goes here if necessary. Create authorized http object...
client = discovery.build('storage', 'v1beta2') # add http=whatever param if auth
request = client.objects().list(
    bucket="mybucket",
    prefix="abc/myfolder")
while request is not None:
  response = request.execute()
  print json.dumps(response, indent=2)
  request = request.list_next(request, response)

Fuller documentation of the list call is here: https://developers.google.com/storage/docs/json_api/v1/objects/list

And the Google Python API client is documented here: https://code.google.com/p/google-api-python-client/




回答2:


This worked for me:

client = storage.Client()
BUCKET_NAME = 'DEMO_BUCKET'
bucket = client.get_bucket(BUCKET_NAME)

blobs = bucket.list_blobs()

for blob in blobs:
    print(blob.name)

The list_blobs() method will return an iterator used to find blobs in the bucket. Now you can iterate over blobs and access every object in the bucket. In this example I just print out the name of the object.

This documentation helped me alot:

  • https://googleapis.github.io/google-cloud-python/latest/storage/blobs.html

  • https://googleapis.github.io/google-cloud-python/latest/_modules/google/cloud/storage/client.html#Client.bucket

I hope I could help!




回答3:


You might also want to look at gcloud-python and documentation.

from gcloud import storage
connection = storage.get_connection(project_name, email, private_key_path)
bucket = connection.get_bucket('my-bucket')

for key in bucket:
  if key.name == 'abc.txt':
    print 'Found it!'
    break

However, you might be better off just checking if the file exists:

if 'abc.txt' in bucket:
  print 'Found it!'


来源:https://stackoverflow.com/questions/22398898/google-cloud-storage-python-any-way-to-list-obj-in-certain-folder-in-gcs

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!