Database sharding vs partitioning

前端 未结 8 1799
滥情空心
滥情空心 2021-01-29 17:46

I have been reading about scalable architectures recently. In that context, two words that keep on showing up with regards to databases are sharding and partitionin

8条回答
  •  一生所求
    2021-01-29 18:11

    Partitioning is more a generic term for dividing data across tables or databases. Sharding is one specific type of partitioning, part of what is called horizontal partitioning.

    Here you replicate the schema across (typically) multiple instances or servers, using some kind of logic or identifier to know which instance or server to look for the data. An identifier of this kind is often called a "Shard Key".

    A common, key-less logic is to use the alphabet to divide the data. A-D is instance 1, E-G is instance 2 etc. Customer data is well suited for this, but will be somewhat misrepresented in size across instances if the partitioning does not take in to account that some letters are more common than others.

    Another common technique is to use a key-synchronization system or logic that ensures unique keys across the instances.

    A well known example you can study is how Instagram solved their partitioning in the early days (see link below). They started out partitioned on very few servers, using Postgres to divide the data from the get-go. I believe it was several thousand logical shards on those few physical shards. Read their awesome writeup from 2012 here: Instagram Engineering - Sharding & IDs

    See here as well: http://www.quora.com/Whats-the-difference-between-sharding-and-partition

提交回复
热议问题