问题
Is there an easy way or example to load Google Cloud Storage
data into bigtable
?
I have lots of json
files generated by pyspark and i wish to load data into bigtable
.
But I can not find an easy way to do that!
I have tried the python
code from google-cloud-python and it work fined, but it just read data line by line into bigtable which was strange for me.
Any help would be greatly appreciated.
回答1:
There is no simple tool to read data in Cloud Bigtable. Here are some options:
- Import the files using Dataflow. This requires java development, and learning the Dataflow programming model.
- Use Python (possibly with Pyspark) to read those json files, and write to Cloud Bigtable using a method called mutate_rows which write to Bigtable in bulk.
FYI, I work on the Cloud Bigtable team. I'm a Java developer, so I opt for #1. Our team has been working to improve our python experience. The extended team recently added some reliability improvements to make sure that mutate_rows is resilient for large jobs. We do not yet have any good examples of integrating with PySpark or Apache Beam's python SDK, but they are on our radar.
来源:https://stackoverflow.com/questions/47345794/load-google-cloud-storage-data-into-bigtable