Hbase column family

前端 未结 1 865
旧时难觅i
旧时难觅i 2021-02-15 04:22

Hbase documentation says that avoid creating more than 2-3 column families because Hbase does not handle more than 2-3 column families very well. The reason for this is compacti

相关标签:
1条回答
  • 2021-02-15 04:44

    Currently (though this is expected to change), all of the column families for a region are flushed together. This is the primary reason why people say "HBase doesn't do well with more than 2 or 3 column families". Consider two CF's, each with one column. Column A:A stores whole web page texts. Column B:B stores the number of words in the page. So every time we flush A:A (which will happen more often because A:A's data is far bigger), we also need to go through a whole separate file I/O juggling routing for column B:B, even though there is no need to- with B:B only holding numbers, I could go for months without flushing it.

    If you store A and B in the same column family (A:A and A:B), you will probably see vastly better flush I/O performance, and because most HBase reads are purely from the memstore, you will probably find that read speeds are equivalent.

    Also, and perhaps more importantly, if the cardinality of the columns is wildly different, then your regionservers will need to maintain useless mostly-empty files for your less-dense column families. This will never change.

    All of this is available in the HBase Book.

    So, as in all such performance situations, measure before deciding what the "correct" path is.

    0 讨论(0)
提交回复
热议问题