HBase-Spark Connector: connection to HBase established for every scan?

别等时光非礼了梦想. 提交于 2019-12-02 07:04:46

This is a common problem. The cost of creating a connection can dwarf the actual work you're doing.

In Cloud Bigtable, you can set google.bigtable.use.cached.data.channel.pool to true in your configuration settings. That would significantly improve performance. Cloud Bigtable ultimately uses a single HTTP/2 end point for all of your Cloud Bigtable instances.

I don't know of a similar construct in HBase, but one way to do this would would suggest creating an implementation of Connection that creates a single cached Connection under the covers. You would have to set the hbase.client.connection.impl to your new class.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!