column based or row based for HBase

前端 未结 2 443
长发绾君心
长发绾君心 2021-02-03 13:22

I am wondering whether HBase is using column based storage or row based storage?

  • I read some technical documents and mentioned advantages of HBase is using column
2条回答
  •  执念已碎
    2021-02-03 13:51

    George, here's a presentation I gave about understanding HBase schemas from HBaseCon 2012:

    http://www.cloudera.com/content/cloudera/en/resources/library/hbasecon/video-hbasecon-2012-hbasecon-2012.html

    In short, each row in HBase is actually a key/value map, where you can have any number of columns (keys), each of which has a value. (And, technically, each of which can have multiple values with different timestamps).

    Additionally, "column families" allow you to host multiple key/value maps in the same row, in different physical (on disk) files. This helps optimize in situations where you have sets of values that are usually accessed disjointly from other sets (so you have less stuff to read off disk). The trade off is that, of course, it's more work to read all the values in a row if you separate columns into two column families, because there are 2x the number of disk accesses needed.

    Unlike more standard "column oriented" databases, I've never heard of anyone creating an HBase table that had a column family for every logical column. There's overhead associated with column families, and the general advice is usually to have no more than 3 or 4 of them. Column families are "design time" information, meaning you must specify them at the time you create (or alter) the table.

    Generally, I find column families to be an advanced design option that you'd only use once you have a deep understanding of HBase's architecture and can show that it would be a net benefit.

    So overall, while it's true that HBase can act in a "column oriented" way, it's not the default nor the most common design pattern in HBase. It's better to think of it as a row store with key/value maps.

提交回复
热议问题