google-cloud-storage

streaming write to gcs using apache beam per element

我是研究僧i 提交于 2021-02-10 16:02:21
问题 Current beam pipeline is reading files as stream using FileIO.matchAll().continuously() . This returns PCollection . I want to write these files back with the same names to another gcs bucket i.e each PCollection is one file metadata/readableFile which should be written back to another bucket after some processing. Is there any sink that i should use to achieve writing each PCollection item back to GCS or are there any ways to do it ? Is it possible to create a window per element and then use

Load data from bucket google cloud

匆匆过客 提交于 2021-02-10 12:58:21
问题 Here is a function to load data from google cloud bucket. action_dataset_folder_path = 'action-data-set' zip_path = 'actions.zip' url='http://console.cloud.google.com/storage/browser/actions' class LoadProgress(tqdm): last_block = 0 def hook(self, block_num=1, block_size=1, total_size=None): self.total = total_size self.update((block_num - self.last_block) * block_size) self.last_block = block_num if not isfile(zip_path): with LoadProgress(unit='B', unit_scale=True, miniters=1, desc='actions

Listing buckets with Google Cloud Storage resulting in NoSuchMethodError, Java AppEngine

随声附和 提交于 2021-02-10 12:32:14
问题 Trying to just list the buckets in my Google Cloud Storage project but can't quite understand why i keep getting the following error: java.lang.NoSuchMethodError: com.google.api.services.storage.model.Bucket.getIamConfiguration()Lcom/google/api/services/storage/model/Bucket$IamConfiguration; I'm testing it with the following servlet: package servlets; import java.io.IOException; import javax.servlet.ServletException; import javax.servlet.annotation.WebServlet; import javax.servlet.http

GCS - Python download blobs with directory structure

烂漫一生 提交于 2021-02-08 20:53:10
问题 I'm using a combination of the GCS python SDK and google API client to loop through a version-enabled bucket and download specific objects based on metadata. from google.cloud import storage from googleapiclient import discovery from oauth2client.client import GoogleCredentials def downloadepoch_objects(): request = service.objects().list( bucket=bucket_name, versions=True ) response = request.execute() for item in response['items']: if item['metadata']['epoch'] == restore_epoch: print(item[

Listing all public links for all objects in a bucket using gsutil

妖精的绣舞 提交于 2021-02-08 15:05:15
问题 Is there a way to list all public links for all the objects stored into a Google Cloud Storage bucket (or a directory in a bucket) using Cloud SDK's gsutil or gcloud ? Something like: $ gsutil ls --public-link gs://my-bucket/a-directory 回答1: Public links for publicly visible objects are predictable. They just match this pattern: https://storage.googleapis.com/BUCKET_NAME/OBJECT_NAME . gsutil doesn't have a command to print URLs for objects in a bucket, but it can just list objects. You could

Listing all public links for all objects in a bucket using gsutil

爱⌒轻易说出口 提交于 2021-02-08 15:03:56
问题 Is there a way to list all public links for all the objects stored into a Google Cloud Storage bucket (or a directory in a bucket) using Cloud SDK's gsutil or gcloud ? Something like: $ gsutil ls --public-link gs://my-bucket/a-directory 回答1: Public links for publicly visible objects are predictable. They just match this pattern: https://storage.googleapis.com/BUCKET_NAME/OBJECT_NAME . gsutil doesn't have a command to print URLs for objects in a bucket, but it can just list objects. You could

What's the risk in using project-id in GCS bucket names?

风格不统一 提交于 2021-02-08 15:01:35
问题 I've been using project-id as a prefix in my GCS bucket-names to easily get a unique name. When I read GCS-best practises It says clearly not to use project-names or project-numbers (nothing about projectId:s) But on the other hand, when I spin up GAE, two buckets containing the project-id are automatically created. Is Google not following their own best practices or did I miss something? Are the greatest risk of having projectId in bucket name that I give clues to a potential attacker about

What's the risk in using project-id in GCS bucket names?

╄→尐↘猪︶ㄣ 提交于 2021-02-08 15:00:04
问题 I've been using project-id as a prefix in my GCS bucket-names to easily get a unique name. When I read GCS-best practises It says clearly not to use project-names or project-numbers (nothing about projectId:s) But on the other hand, when I spin up GAE, two buckets containing the project-id are automatically created. Is Google not following their own best practices or did I miss something? Are the greatest risk of having projectId in bucket name that I give clues to a potential attacker about

how to createWriteStream() to GCS?

狂风中的少年 提交于 2021-02-08 09:51:30
问题 I'm trying to write an Express route that takes an image URI in the POST body and then saves the image into a Google Cloud Storage Bucket. I'm not able to persist this image to local disk and need to stream the buffer straight to the GCS bucket. My route creates a 4KB "stub" in the GCS bucket but there's no image payload. My nodejs then proceeds to crash... Q: What is the correct way to .pipe() the results of the https.request() to blob.createWriteStream()? Is this the right approach? I've

BigQuery Data Transfer Service with BigQuery partitioned table [closed]

怎甘沉沦 提交于 2021-02-08 06:12:56
问题 Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 2 months ago . Improve this question I have access to a project within BigQuery. I'm looking to create a partitioned table by ingestion time, partitioned by day, then set up a BigQuery Data Transfers process that brings avro files in from multiple directories within a Google Cloud Storage Bucket.