google-cloud-bigtable

Why increments are not supported in Dataflow-BigTable connector?

心已入冬 提交于 2020-05-13 08:12:11
问题 We have a use case in the Streaming mode where we want to keep track of a counter on BigTable from the pipeline (something #items finished processing) for which we need the increment operation. From looking at https://cloud.google.com/bigtable/docs/dataflow-hbase, I see that append/increment operations of the HBase API are not supported by this client. The reason stated is the retry logic on batch mode but if Dataflow guarantees exactly-once, why would supporting it be a bad idea since I know

Bigtable import error

断了今生、忘了曾经 提交于 2020-03-05 07:26:01
问题 I generated a sequence file using hive and trying to import it in bigtable, my import job is failing with the error below. 2015-06-21 00:05:42,584 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1434843251631_0007_m_000000_1: Error: java.lang.ClassCastException: org.apache.hadoop.io.BytesWritable cannot be cast to org.apache.hadoop.hbase.io.ImmutableBytesWritable at com.google.cloud.bigtable.mapreduce.Import

Bigtable import error

一世执手 提交于 2020-03-05 07:24:30
问题 I generated a sequence file using hive and trying to import it in bigtable, my import job is failing with the error below. 2015-06-21 00:05:42,584 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1434843251631_0007_m_000000_1: Error: java.lang.ClassCastException: org.apache.hadoop.io.BytesWritable cannot be cast to org.apache.hadoop.hbase.io.ImmutableBytesWritable at com.google.cloud.bigtable.mapreduce.Import

Migration from DynamoDB to Spanner/BigTable

左心房为你撑大大i 提交于 2020-01-14 10:48:26
问题 I have a use case where I need to migrate 70 TB of data from DynamoDB to BigTable and Spanner. Tables with a single index will go to BigTable else they will go to Spanner. I can easily handle the historical loads by exporting the data to S3 --> GCS --> Spanner/BigTable. But the challenging part is to handle the incremental streaming loads simultaneously happening on DynamoDB. There are 300 tables in DynamoDB. How to handle this thing in the best possible manner? Has anyone done this before?

Bigtable (from Go) returns “server closed the stream without sending trailers”

▼魔方 西西 提交于 2020-01-05 23:20:11
问题 We are using Google Cloud Bigtable, accessing it from GCE instances using the Go library to access it. For some ReadRow queries we get the following error: rpc error: code = 13 desc = "server closed the stream without sending trailers" It is noteworthy that these are consistent. In other words if we retry the same query (we wait ~15 minutes between attempts) we (almost?) always get the same error again. So it does not appear to simply be a transient error, but instead is probably somehow

How to load data into Google Cloud Bigtable from Google BigQuery

旧时模样 提交于 2019-12-31 02:09:48
问题 I need to populate data into Google Cloud Bigtable and the source of the data will be Google BigQuery. As an exercise, I am able to read the data from BigQuery and as an seperate exercise I am able to write data into Bigtable as well. Now I have to combine these 2 operations into one Google Cloud Dataflow job. Any example will be of great help. 回答1: You can just use the transforms as shown in those examples, adding whatever logic you need in between, for example: Pipeline p = Pipeline.create

Which HBase connector for Spark 2.0 should I use?

ぐ巨炮叔叔 提交于 2019-12-28 13:51:51
问题 Our stack is composed of Google Data Proc (Spark 2.0) and Google BigTable (HBase 1.2.0) and I am looking for a connector working with these versions. The Spark 2.0 and the new DataSet API support is not clear to me for the connectors I have found: spark-hbase : https://github.com/apache/hbase/tree/master/hbase-spark spark-hbase-connector : https://github.com/nerdammer/spark-hbase-connector hortonworks-spark/shc : https://github.com/hortonworks-spark/shc The project is written in Scala 2.11

Which HBase connector for Spark 2.0 should I use?

陌路散爱 提交于 2019-12-28 13:50:24
问题 Our stack is composed of Google Data Proc (Spark 2.0) and Google BigTable (HBase 1.2.0) and I am looking for a connector working with these versions. The Spark 2.0 and the new DataSet API support is not clear to me for the connectors I have found: spark-hbase : https://github.com/apache/hbase/tree/master/hbase-spark spark-hbase-connector : https://github.com/nerdammer/spark-hbase-connector hortonworks-spark/shc : https://github.com/hortonworks-spark/shc The project is written in Scala 2.11

How to set a future insert date in Google Cloud Bigtable? Trying to calculate it using TTL

*爱你&永不变心* 提交于 2019-12-25 00:47:06
问题 I have a table with only one column family, this column has a TTL of 172800 SECONDS (2 DAYS), I need some data to be deleted before the deadline. If I want the value to expire in 5mins, I calculate the expiry time and set the insert date to be 5 mins before expiry time. I am using the HBase Client for Java to do this. But the value doesn't seem to expire. Any suggestions on the same? I used cbt to create the table: cbt createtable my_table families=cf1:maxage=2d HColumnDescriptor: {NAME =>

google cloud bigtable column versions are not deleted

 ̄綄美尐妖づ 提交于 2019-12-24 12:04:15
问题 We have created a table in cloud bigtable with two column families. One column family with 30 versions and the other with 1 version. However, when we query the table we are getting multiple versions of the columns for which we have set max number of versions to 1. Table create statement: create 'myTable', {NAME => 'cf1', VERSIONS => '30'}, {NAME => 'cf2', VERSIONS => '1'} Describe 'myTable': {NAME => ‘cf2’, BLOOMFILTER => ‘ROW’, VERSIONS => ‘**1**’, IN_MEMORY => ‘false’, KEEP_DELETED_CELLS =>