Bigtable performance influence column families

强颜欢笑 提交于 2019-12-05 15:56:42

If you are retrieving X cells per row, it does not make a major performance difference whether those cells are in X separate column families or 1 column family with X columns qualifiers.

The performance difference comes in if you only actually need cells for a row that have some specific purpose - you can the avoid selecting all cells for the row and instead just fetch one column family (by specifying a filter on the ReadRow call)


A more important factor is simply picking a schema that accuratly describes your data. If you do this any gain of the type above will come naturally. Also you will avoid hitting the 100 column family recommended limit.

For example: imagine you are writing leaderboard software, and you want to store scores a player has hit for each game and some personal details. Your schema might be:

  • Row Key: username
  • Column Family user_info
    • Column Qualifier full_name
    • Column Qualifier password_hash
  • Column Family game_scores
    • Column Qualifier candy_royale
    • Column Qualifier clash_of_tanks

Having each game stored as a separate column within the game_scores column family allows all scores for a user to be fetched at once without also fetching user_info, allows keeping the number of column families manageable, allows time series of scores for each game independently and other benefits from mirroring the nature of the data.

The reason why there is no speed up in performance when splitting data over multiple column families, is that they are stored in the same "locality group", i.e. file. Internally Google does offer the possibility to split different column families over different locality groups, but this isn't exposed in their managed Cloud Bigtable service. See the comments on this answer.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!