HBase row key design for monotonically increasing keys

后端 未结 4 1596
北荒
北荒 2021-02-14 07:11

I\'ve an HBase table where I\'m writing the row keys like:

~1
~2
~3
...
~9
~10
相关标签:
4条回答
  • 2021-02-14 07:43

    How should a row key be designed so that the row with key ~10 comes last?

    You see the scan output in this way because rowkeys in HBase are kept sorted lexicographically irrespective of the insertion order. This means that they are sorted based on their string representations. Remember that rowkeys in HBase are treated as an array of bytes having a string representation. The lowest order rowkey appears first in a table. That's why 10 appears before 2 and so on. See the sections Rows on this page to know more about this.

    When you left pad the integers with zeros their natural ordering is kept intact while sorting lexicographically and that's why you see the scan order same as the order in which you had inserted the data. To do that you can design your rowkeys as suggested by @shutty.

    I'm looking for some recommended ways or the ways that are more popular for designing HBase row keys.

    There are some general guidelines to be followed in order to devise a good design :

    • Keep the rowkey as small as possible.
    • Avoid using monotonically increasing rowkeys, such as timestamp etc. This is a poor shecma design and leads to RegionServer hotspotting. If you can't avoid that use someway, like hashing or salting to avoid hotspotting.
    • Avoid using Strings as rowkeys if possible. String representation of a number takes more bytes as compared to its integer or long representation. For example : A long is 8 bytes. You can store an unsigned number up to 18,446,744,073,709,551,615 in those eight bytes. If you stored this number as a String -- presuming a byte per character -- you need nearly 3x the bytes.
    • Use some mechanism, like hashing, in order to get uniform distribution of rows in case your regions are not evenly loaded. You could also create pre-splitted tables to achieve this.

    See this link for more on rowkey design.

    HTH

    0 讨论(0)
  • 2021-02-14 07:47

    Fixed length keys are really recommended if possible. Bytes.toBytes(Long value) can be used to get a byte array from a counter. It will sort well for positive longs less than Long.MAX_VALUE.

    0 讨论(0)
  • 2021-02-14 07:54

    HBase stores rowkeys in lexicographical order, so you can try to use this schema with fixed-length rowrey:

    <prefix>~0001
    <prefix>~0002
    <prefix>~0003
    ...
    <prefix>~0009
    <prefix>~0010
    

    Keep in mind that you also should use random prefixes to avoid region hot-spotting (when a single region accepts most of the writes, while the other regions are idle).

    0 讨论(0)
  • 2021-02-14 07:55

    monotonically increasing keys isnt a good schema for hbase. you can read more here: http://hbase.apache.org/book/rowkey.design.html

    there also a link there to OpenTSDB that solve this problem.

    0 讨论(0)
提交回复
热议问题