google-cloud-bigtable

Bigtable-BigQuery Import via DataFlow: 2 questions on table partitioning and Timestamps

不羁岁月 提交于 2019-12-24 08:02:38
问题 I have a job in Dataflow importing data from Bigtable into Bigquery by using built-in Dataflow APIs for both. I have two questions: Question 1: If the source data is in one large table in Bigtable, how can I partition it into a set of sub- or smaller tables in BigQuery dynamically based on, say, the given Bigtable row-key known only at run-time? The Java code in Dataflow looks like this: p.apply(Read.from(CloudBigtableIO.read(config))) .apply(ParDo.of(new SomeDoFNonBTSourceData())) .apply

Bigtable performance influence column families

£可爱£侵袭症+ 提交于 2019-12-22 08:06:22
问题 We are currently investigating the influence of using multiple column families on the performance of our bigtable queries. We found that splitting the columns into multiple column families does not increase the performance. Does anyone have had similar experiences? Some more details about our benchmark setup. At this moment each row in our production table contains around 5 columns, each containing between 0,1 to 1 KB of data. All columns are stored into one column family. When performing a

Populating data in Google Cloud Bigtable is taking long time

眉间皱痕 提交于 2019-12-21 23:07:12
问题 I am using the following code to populate data into Bigtable: CloudBigtableScanConfiguration config = new CloudBigtableScanConfiguration.Builder() .withConfiguration("clusterId", options.getBigTableClusterId()) .withProjectId(options.getProject()) .withInstanceId(options.getBigTableInstanceId()) .withTableId(options.getOutputBTTable()) .build(); Pipeline p = Pipeline.create(options); /** * Read Data from Big Query */ CloudBigtableIO.initializeForWrite(p); p.apply(BigQueryIO.Read.fromQuery

Why does BigTable have column families?

邮差的信 提交于 2019-12-21 21:18:25
问题 Why is BigTable structured as a two-level hierarchy of "family:qualifier"? Specifically, why is this enforced rather than just having columns and, say, recommending that users name their qualifiers "vertical:column"? I am interested in whether or not enforcing this enables some engineering optimizations or if this is strictly a design thing. 回答1: There are a couple of advantages to family groups: queries become easier by getting a group of column qualifiers in a single column family Bigtable

Load Google Cloud Storage data into bigtable

混江龙づ霸主 提交于 2019-12-20 05:23:38
问题 Is there an easy way or example to load Google Cloud Storage data into bigtable ? I have lots of json files generated by pyspark and i wish to load data into bigtable . But I can not find an easy way to do that! I have tried the python code from google-cloud-python and it work fined, but it just read data line by line into bigtable which was strange for me. Any help would be greatly appreciated. 回答1: There is no simple tool to read data in Cloud Bigtable. Here are some options: Import the

How to connect to a running bigtable emulator from java

微笑、不失礼 提交于 2019-12-19 07:39:20
问题 I am trying to use the bigtable emulator from gcloud beta emulators. I launch the emulator, grab the hostname (localhost) and port (in this instance 8885) gcloud beta emulators bigtable start Executing: /usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/bigtable-emulator/cbtemulator --host=localhost --port=8885 I am trying to connect to the emulator from a java test client, here is what I provide: Configuration conf = BigtableConfiguration.configure(projectId, instanceId);

ValueProvider type parameters not getting honored at the template execution time

半腔热情 提交于 2019-12-13 03:26:14
问题 I am trying to pass BigTable tableId, instanceId and projectId which are defined as ValueProvider in the TemplateOption class at the execution time as they are runtime values but they don't get honored with the new values . The pipleine gets executed with the old values which were defined when the pipeline was constructed. What changes should i make so that it honors values at runtime? Pipeline p = Pipeline.create(options); com.google.cloud.bigtable.config.BigtableOptions.Builder

Bigtable error with sbt assembly fat JAR (Neither Jetty ALPN nor OpenSSL are available)

你说的曾经没有我的故事 提交于 2019-12-12 15:13:10
问题 I would like to build a Restful API with akka-http able to retrieve data from Bigtable (HBase). The Bigtable client API requires netty-tcnative-boringssl-static to connect. This works pretty well inside my Intellij IDE, but when I build a fat JAR with sbt-assembly, and then run the server, I get the following error: 2017-01-10 12:03:41 ERROR BigtableSession:129 - Neither Jetty ALPN nor OpenSSL are available. OpenSSL unavailability cause: java.lang.IllegalArgumentException: Failed to load any

Google Cloud Bigtable Client Connection Pooling

假装没事ソ 提交于 2019-12-12 12:48:08
问题 I've done a load test against Google Cloud Bigtable by making a dummy web app that handle requests for writing and reading data to and from Bigtable. At the beginning, I was only using a single Bigtable connection as a singleton and reusing it across all threads (requests). When I increased the number of requests, I noticed that the performance was getting slower. Somehow, instead of increasing the number of nodes, I got the idea of making multiple Bigtable connections and just randomly

Row timestamps in Bigtable - when are they updated?

泪湿孤枕 提交于 2019-12-11 17:55:54
问题 The definition of TimestampRangeFilter from Bigtable's Go API is: TimestampRangeFilter returns a filter that matches any rows whose timestamp is within the given time bounds. Is the row timestamp updated when: Any column value is written/changed within that row? The row key is updated? The row is created? Any other circumstances? 回答1: I think this is a documentation bug. It should read something like: TimestampRangeFilter returns a filter that matches any cells whose timestamp is within the