google-cloud-bigtable | 易学教程

Why increments are not supported in Dataflow-BigTable connector?

阅读更多关于 Why increments are not supported in Dataflow-BigTable connector?

问题 We have a use case in the Streaming mode where we want to keep track of a counter on BigTable from the pipeline (something #items finished processing) for which we need the increment operation. From looking at https://cloud.google.com/bigtable/docs/dataflow-hbase, I see that append/increment operations of the HBase API are not supported by this client. The reason stated is the retry logic on batch mode but if Dataflow guarantees exactly-once, why would supporting it be a bad idea since I know

Bigtable import error

阅读更多关于 Bigtable import error

问题 I generated a sequence file using hive and trying to import it in bigtable, my import job is failing with the error below. 2015-06-21 00:05:42,584 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1434843251631_0007_m_000000_1: Error: java.lang.ClassCastException: org.apache.hadoop.io.BytesWritable cannot be cast to org.apache.hadoop.hbase.io.ImmutableBytesWritable at com.google.cloud.bigtable.mapreduce.Import

Bigtable import error

阅读更多关于 Bigtable import error

Migration from DynamoDB to Spanner/BigTable

阅读更多关于 Migration from DynamoDB to Spanner/BigTable

问题 I have a use case where I need to migrate 70 TB of data from DynamoDB to BigTable and Spanner. Tables with a single index will go to BigTable else they will go to Spanner. I can easily handle the historical loads by exporting the data to S3 --> GCS --> Spanner/BigTable. But the challenging part is to handle the incremental streaming loads simultaneously happening on DynamoDB. There are 300 tables in DynamoDB. How to handle this thing in the best possible manner? Has anyone done this before?

Bigtable (from Go) returns “server closed the stream without sending trailers”

阅读更多关于 Bigtable (from Go) returns “server closed the stream without sending trailers”

问题 We are using Google Cloud Bigtable, accessing it from GCE instances using the Go library to access it. For some ReadRow queries we get the following error: rpc error: code = 13 desc = "server closed the stream without sending trailers" It is noteworthy that these are consistent. In other words if we retry the same query (we wait ~15 minutes between attempts) we (almost?) always get the same error again. So it does not appear to simply be a transient error, but instead is probably somehow

How to load data into Google Cloud Bigtable from Google BigQuery

阅读更多关于 How to load data into Google Cloud Bigtable from Google BigQuery

问题 I need to populate data into Google Cloud Bigtable and the source of the data will be Google BigQuery. As an exercise, I am able to read the data from BigQuery and as an seperate exercise I am able to write data into Bigtable as well. Now I have to combine these 2 operations into one Google Cloud Dataflow job. Any example will be of great help. 回答1: You can just use the transforms as shown in those examples, adding whatever logic you need in between, for example: Pipeline p = Pipeline.create

Which HBase connector for Spark 2.0 should I use?

阅读更多关于 Which HBase connector for Spark 2.0 should I use?

问题 Our stack is composed of Google Data Proc (Spark 2.0) and Google BigTable (HBase 1.2.0) and I am looking for a connector working with these versions. The Spark 2.0 and the new DataSet API support is not clear to me for the connectors I have found: spark-hbase : https://github.com/apache/hbase/tree/master/hbase-spark spark-hbase-connector : https://github.com/nerdammer/spark-hbase-connector hortonworks-spark/shc : https://github.com/hortonworks-spark/shc The project is written in Scala 2.11

Which HBase connector for Spark 2.0 should I use?

阅读更多关于 Which HBase connector for Spark 2.0 should I use?

How to set a future insert date in Google Cloud Bigtable? Trying to calculate it using TTL

阅读更多关于 How to set a future insert date in Google Cloud Bigtable? Trying to calculate it using TTL

问题 I have a table with only one column family, this column has a TTL of 172800 SECONDS (2 DAYS), I need some data to be deleted before the deadline. If I want the value to expire in 5mins, I calculate the expiry time and set the insert date to be 5 mins before expiry time. I am using the HBase Client for Java to do this. But the value doesn't seem to expire. Any suggestions on the same? I used cbt to create the table: cbt createtable my_table families=cf1:maxage=2d HColumnDescriptor: {NAME =>

google cloud bigtable column versions are not deleted

阅读更多关于 google cloud bigtable column versions are not deleted

问题 We have created a table in cloud bigtable with two column families. One column family with 30 versions and the other with 1 version. However, when we query the table we are getting multiple versions of the columns for which we have set max number of versions to 1. Table create statement: create 'myTable', {NAME => 'cf1', VERSIONS => '30'}, {NAME => 'cf2', VERSIONS => '1'} Describe 'myTable': {NAME => ‘cf2’, BLOOMFILTER => ‘ROW’, VERSIONS => ‘**1**’, IN_MEMORY => ‘false’, KEEP_DELETED_CELLS =>