Load Google Cloud Storage data into bigtable

混江龙づ霸主 提交于 2019-12-20 05:23:38

问题


Is there an easy way or example to load Google Cloud Storage data into bigtable?

I have lots of json files generated by pyspark and i wish to load data into bigtable.

But I can not find an easy way to do that!

I have tried the python code from google-cloud-python and it work fined, but it just read data line by line into bigtable which was strange for me.

Any help would be greatly appreciated.


回答1:


There is no simple tool to read data in Cloud Bigtable. Here are some options:

  1. Import the files using Dataflow. This requires java development, and learning the Dataflow programming model.
  2. Use Python (possibly with Pyspark) to read those json files, and write to Cloud Bigtable using a method called mutate_rows which write to Bigtable in bulk.

FYI, I work on the Cloud Bigtable team. I'm a Java developer, so I opt for #1. Our team has been working to improve our python experience. The extended team recently added some reliability improvements to make sure that mutate_rows is resilient for large jobs. We do not yet have any good examples of integrating with PySpark or Apache Beam's python SDK, but they are on our radar.



来源:https://stackoverflow.com/questions/47345794/load-google-cloud-storage-data-into-bigtable

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!