google-cloud-bigtable | 易学教程

Bigtable-BigQuery Import via DataFlow: 2 questions on table partitioning and Timestamps

阅读更多关于 Bigtable-BigQuery Import via DataFlow: 2 questions on table partitioning and Timestamps

问题 I have a job in Dataflow importing data from Bigtable into Bigquery by using built-in Dataflow APIs for both. I have two questions: Question 1: If the source data is in one large table in Bigtable, how can I partition it into a set of sub- or smaller tables in BigQuery dynamically based on, say, the given Bigtable row-key known only at run-time? The Java code in Dataflow looks like this: p.apply(Read.from(CloudBigtableIO.read(config))) .apply(ParDo.of(new SomeDoFNonBTSourceData())) .apply

Bigtable performance influence column families

阅读更多关于 Bigtable performance influence column families

问题 We are currently investigating the influence of using multiple column families on the performance of our bigtable queries. We found that splitting the columns into multiple column families does not increase the performance. Does anyone have had similar experiences? Some more details about our benchmark setup. At this moment each row in our production table contains around 5 columns, each containing between 0,1 to 1 KB of data. All columns are stored into one column family. When performing a

Populating data in Google Cloud Bigtable is taking long time

阅读更多关于 Populating data in Google Cloud Bigtable is taking long time

问题 I am using the following code to populate data into Bigtable: CloudBigtableScanConfiguration config = new CloudBigtableScanConfiguration.Builder() .withConfiguration("clusterId", options.getBigTableClusterId()) .withProjectId(options.getProject()) .withInstanceId(options.getBigTableInstanceId()) .withTableId(options.getOutputBTTable()) .build(); Pipeline p = Pipeline.create(options); /** * Read Data from Big Query */ CloudBigtableIO.initializeForWrite(p); p.apply(BigQueryIO.Read.fromQuery

Why does BigTable have column families?

阅读更多关于 Why does BigTable have column families?

问题 Why is BigTable structured as a two-level hierarchy of "family:qualifier"? Specifically, why is this enforced rather than just having columns and, say, recommending that users name their qualifiers "vertical:column"? I am interested in whether or not enforcing this enables some engineering optimizations or if this is strictly a design thing. 回答1: There are a couple of advantages to family groups: queries become easier by getting a group of column qualifiers in a single column family Bigtable

Load Google Cloud Storage data into bigtable

阅读更多关于 Load Google Cloud Storage data into bigtable

问题 Is there an easy way or example to load Google Cloud Storage data into bigtable ? I have lots of json files generated by pyspark and i wish to load data into bigtable . But I can not find an easy way to do that! I have tried the python code from google-cloud-python and it work fined, but it just read data line by line into bigtable which was strange for me. Any help would be greatly appreciated. 回答1: There is no simple tool to read data in Cloud Bigtable. Here are some options: Import the

How to connect to a running bigtable emulator from java

阅读更多关于 How to connect to a running bigtable emulator from java

问题 I am trying to use the bigtable emulator from gcloud beta emulators. I launch the emulator, grab the hostname (localhost) and port (in this instance 8885) gcloud beta emulators bigtable start Executing: /usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/bigtable-emulator/cbtemulator --host=localhost --port=8885 I am trying to connect to the emulator from a java test client, here is what I provide: Configuration conf = BigtableConfiguration.configure(projectId, instanceId);

ValueProvider type parameters not getting honored at the template execution time

阅读更多关于 ValueProvider type parameters not getting honored at the template execution time

问题 I am trying to pass BigTable tableId, instanceId and projectId which are defined as ValueProvider in the TemplateOption class at the execution time as they are runtime values but they don't get honored with the new values . The pipleine gets executed with the old values which were defined when the pipeline was constructed. What changes should i make so that it honors values at runtime? Pipeline p = Pipeline.create(options); com.google.cloud.bigtable.config.BigtableOptions.Builder

Bigtable error with sbt assembly fat JAR (Neither Jetty ALPN nor OpenSSL are available)

阅读更多关于 Bigtable error with sbt assembly fat JAR (Neither Jetty ALPN nor OpenSSL are available)

问题 I would like to build a Restful API with akka-http able to retrieve data from Bigtable (HBase). The Bigtable client API requires netty-tcnative-boringssl-static to connect. This works pretty well inside my Intellij IDE, but when I build a fat JAR with sbt-assembly, and then run the server, I get the following error: 2017-01-10 12:03:41 ERROR BigtableSession:129 - Neither Jetty ALPN nor OpenSSL are available. OpenSSL unavailability cause: java.lang.IllegalArgumentException: Failed to load any

Google Cloud Bigtable Client Connection Pooling

阅读更多关于 Google Cloud Bigtable Client Connection Pooling

问题 I've done a load test against Google Cloud Bigtable by making a dummy web app that handle requests for writing and reading data to and from Bigtable. At the beginning, I was only using a single Bigtable connection as a singleton and reusing it across all threads (requests). When I increased the number of requests, I noticed that the performance was getting slower. Somehow, instead of increasing the number of nodes, I got the idea of making multiple Bigtable connections and just randomly

Row timestamps in Bigtable - when are they updated?

阅读更多关于 Row timestamps in Bigtable - when are they updated?

问题 The definition of TimestampRangeFilter from Bigtable's Go API is: TimestampRangeFilter returns a filter that matches any rows whose timestamp is within the given time bounds. Is the row timestamp updated when: Any column value is written/changed within that row? The row key is updated? The row is created? Any other circumstances? 回答1: I think this is a documentation bug. It should read something like: TimestampRangeFilter returns a filter that matches any cells whose timestamp is within the