问题
Is there any way one can append two files in GCS, suppose file one is a full
load and second file is an incremental load. Then what's the way we can append
the two?
Secondly, using gsutil compose will append the two files including the attributes
names as well. So, in the final file I want the data of the two files.
回答1:
You can append two separate files using compose in the Google Cloud Shell and rename the output file as the first file, like this:
gsutil compose gs://bucket/obj1 [gs://bucket/obj2 ...] gs://bucket/obj1
This command is meant for parallel uploads in which you divide a large object file in smaller objects. They get uploaded to Google Cloud Storage and then you can append them to get the original file. You can find more information on Composite Objects and Parallel Uploads.
I've come up with two possible solutions:
Google Cloud Function solution
The option I would go for is using a Cloud Function. Doing something like the following:
- Create an empty bucket like append_bucket.
- Upload the first file.
- Create a Cloud Function to be triggered by new uploaded files on the bucket.
- Upload the second file.
- Read the first and the second file (you will have to download them as string first).
- Make the append operation.
- Upload the result to the bucket.
Google Dataflow solution
You can also do it with Dataflow for BigQuery (keep in mind it’s still in beta).
- Create a BigQuery dataset and table.
- Create a Dataflow instance, from the template Cloud Storage Text to BigQuery.
- Create a Javascript file with the logic to transform the text.
- Upload your files in Json format to the bucket.
- Dataflow will read the Json file, execute the Javascript code and append the new data to the BigQuery dataset.
- At last, export the BigQuery query result to Cloud Storage.
来源:https://stackoverflow.com/questions/53487432/how-to-append-files-in-gcs-with-the-same-schema