Bigtable CSV import

我怕爱的太早我们不能终老 提交于 2019-12-07 15:39:39

问题


I have a large csv dataset (>5TB) in multiple files (stored in a storage bucket) that I need to import into Google Bigtable. The files are in the format:

rowkey,s1,s2,s3,s4
text,int,int,int,int
...

There is an importtsv function with hbase that would be perfect but this does not seem to be available when using Google hbase shell in windows. Is it possible to use this tool? If not, what is the fastest way of achieving this? I have little experience with hbase and Google Cloud so a simple example would be great. I have seen some similar examples using DataFlow but would prefer not to learn how to do this unless necessary.

Thanks


回答1:


The ideal way to import something this large into Cloud Bigtable is to put your TSV on Google Cloud Storage.

  • gsutil mb <your-bucket-name>
  • gsutil -m cp -r <source dir> gs://<your-bucket-name>/

Then use Cloud Dataflow.

  1. Use the HBase shell to create the table, Column Family, and the output columns.

  2. Write a small Dataflow job to read all the files, then create a key, followed by writing the table. (See this example to get started.)

A bit easier way would be to: (Note- untested)

  • Copy your files to Google Cloud Storage
  • Use Google Cloud Dataproc the example shows how to create a cluster and hookup Cloud Bigtable.
  • ssh to your cluster master - the script in the wordcount-mapreduce example will accept ./cluster ssh
  • Use the HBase TSV importer to start a Map Reduce job.

    hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns=a,b,c <tablename> gs://<your-bucket-name>/<dir>/**




回答2:


I created a bug on the Cloud Bigtable Client project to implement a method of doing importtsv.

Even if we can get importtsv to work, setting up Bigtable on your own machine may take some doing. Importing a file this big is a bit involved for a single machine, so usually a distributed job (Hadoop or Dataflow) is needed, so I'm not sure how well running the job from your machine is going to work.



来源:https://stackoverflow.com/questions/34104427/bigtable-csv-import

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!