google-cloud-bigtable

Bigtable CSV import

我怕爱的太早我们不能终老 提交于 2019-12-07 15:39:39
问题 I have a large csv dataset (>5TB) in multiple files (stored in a storage bucket) that I need to import into Google Bigtable. The files are in the format: rowkey,s1,s2,s3,s4 text,int,int,int,int ... There is an importtsv function with hbase that would be perfect but this does not seem to be available when using Google hbase shell in windows. Is it possible to use this tool? If not, what is the fastest way of achieving this? I have little experience with hbase and Google Cloud so a simple

Cannot connect from Titan to Google Bigtable via Hbase client

三世轮回 提交于 2019-12-07 14:12:00
问题 I am trying to connect to Titan 1.0.0 with Hadoop 2 (HBase 1.0.2 client) (available in https://github.com/thinkaurelius/titan/wiki/Downloads) with Google Cloud Bigtable service, using its HBase client. I could successfully connect to Bigtable from an HBase shell in a GCE instance, by following this procedure. The hbase-site.xml is according to the template, and I have downloaded Bigtable jars for ALPN_VERSION=8.1.5.v20150921 (Oracle JDK SE 1.8.0_60) <configuration> <property> <name>hbase

How to get filtered data from Bigtable using Python?

不羁岁月 提交于 2019-12-06 13:21:27
I am using Bigtable emulator and have successfully added a table in it and now I need to get filtered data. The table is as follows: arc_record_id | record_id | batch_id 1 |624 |86 2 |625 |86 3 |626 |86 and so on...till arc_record_id 10. I have tried this given below Python code: visit_dt_filter = ValueRangeFilter(start_value = "1".encode('utf-8'), end_value = "2".encode('utf-8')) col1_filter = ColumnQualifierRegexFilter(b'arc_record_id') chain1 = RowFilterChain(filters=[col1_filter, visit_dt_filter]) partial_rows = testTable.read_rows(filter_=chain1) for row in partial_rows: cell = row.cells

Exceptions in Google Cloud Dataflow pipelines from BigQuery to Cloud Bigtable

心已入冬 提交于 2019-12-06 12:37:10
问题 Executing DataFlow pipelines, every once in a while we see those Exceptions. Is there anything we can do about them? We have a quite simple flow that reads data from a BigQuery query and populate data in BigTable. Also what happens to data inside the pipeline? Is it reprocessed? Or is it lost in transit to BigTable? CloudBigtableIO.initializeForWrite(p); p.apply(BigQueryIO.Read.fromQuery(getQuery())) .apply(ParDo.of(new DoFn<TableRow, Mutation>() { public void processElement(ProcessContext c)

Cannot connect from Titan to Google Bigtable via Hbase client

早过忘川 提交于 2019-12-05 20:32:34
I am trying to connect to Titan 1.0.0 with Hadoop 2 (HBase 1.0.2 client) (available in https://github.com/thinkaurelius/titan/wiki/Downloads ) with Google Cloud Bigtable service, using its HBase client. I could successfully connect to Bigtable from an HBase shell in a GCE instance, by following this procedure . The hbase-site.xml is according to the template, and I have downloaded Bigtable jars for ALPN_VERSION=8.1.5.v20150921 (Oracle JDK SE 1.8.0_60) <configuration> <property> <name>hbase.client.connection.impl</name> <value>com.google.cloud.bigtable.hbase1_0.BigtableConnection</value> <

Bigtable performance influence column families

强颜欢笑 提交于 2019-12-05 15:56:42
We are currently investigating the influence of using multiple column families on the performance of our bigtable queries. We found that splitting the columns into multiple column families does not increase the performance. Does anyone have had similar experiences? Some more details about our benchmark setup. At this moment each row in our production table contains around 5 columns, each containing between 0,1 to 1 KB of data. All columns are stored into one column family. When performing a row key range filter (which returns on average 340 rows) and apply a column regex fitler (which returns

Exceptions in Google Cloud Dataflow pipelines from BigQuery to Cloud Bigtable

点点圈 提交于 2019-12-04 19:10:54
Executing DataFlow pipelines, every once in a while we see those Exceptions. Is there anything we can do about them? We have a quite simple flow that reads data from a BigQuery query and populate data in BigTable. Also what happens to data inside the pipeline? Is it reprocessed? Or is it lost in transit to BigTable? CloudBigtableIO.initializeForWrite(p); p.apply(BigQueryIO.Read.fromQuery(getQuery())) .apply(ParDo.of(new DoFn<TableRow, Mutation>() { public void processElement(ProcessContext c) { Mutation output = convertDataToRow(c.element()); c.output(output); } })) .apply(CloudBigtableIO

HBase-Spark Connector: connection to HBase established for every scan?

大憨熊 提交于 2019-12-02 11:41:41
问题 I am using Cloudera's HBase-Spark connector to do intensive HBase or BigTable scans. It works OK, but looking at Spark's detailed logs, it looks like the code tries to re-establish a connection to HBase with every call to process the results of a Scan() which I do via the JavaHBaseContext.foreachPartition() . Am I right to think that this code re-establishes a connection to HBase every time? If so, how can I re-write it to make sure I reuse the already established connection? Here's the full

HBase-Spark Connector: connection to HBase established for every scan?

别等时光非礼了梦想. 提交于 2019-12-02 07:04:46
I am using Cloudera's HBase-Spark connector to do intensive HBase or BigTable scans. It works OK, but looking at Spark's detailed logs, it looks like the code tries to re-establish a connection to HBase with every call to process the results of a Scan() which I do via the JavaHBaseContext.foreachPartition() . Am I right to think that this code re-establishes a connection to HBase every time? If so, how can I re-write it to make sure I reuse the already established connection? Here's the full sample code that produces this behavior: import org.apache.hadoop.hbase.client.ConnectionFactory;

Load Google Cloud Storage data into bigtable

会有一股神秘感。 提交于 2019-12-02 04:19:32
Is there an easy way or example to load Google Cloud Storage data into bigtable ? I have lots of json files generated by pyspark and i wish to load data into bigtable . But I can not find an easy way to do that! I have tried the python code from google-cloud-python and it work fined, but it just read data line by line into bigtable which was strange for me. Any help would be greatly appreciated. There is no simple tool to read data in Cloud Bigtable. Here are some options: Import the files using Dataflow . This requires java development, and learning the Dataflow programming model. Use Python