What Keeps Relational Databases From Horizontal Scaling?

问题

When I researched horizontal scaling for relational databases on the internet, I got the impression that the only option which includes write scaling as well as read scaling is sharding, which seems to be a manual design process that involves complex application specific configurations and is hard to maintain if you need to change your sharding structure.

On the other hand, NoSQL seems to be natively supporting horizontal scaling but it has the drawback of not supporting transactions, ACID etc.

One other concept that seems to have been popular recently is NewSQL databases. And these databases promise to hit the sweet spot by being both ACID compliant and able to horizontally scale, either by automatic sharding or some other innovative architecture.

My question is, if we are using SAN with our relational database, isn't adding more database servers to the cluster and more disks to the SAN going to achieve horizontal scaling? (Adding disks will increase total disk IOPS and throughput as well as disk space.) What will be the bottleneck there so that we need to use a NewSQL database to achieve both ACID and horizontal scaling?

回答1:

Horizontal scaling in relational databases is hard to achieve because when you have tables (or shards of the same table) across the different cluster nodes, joins usually become very inefficient. Additionally, there is a problem of replication and keeping ACID guarantees while ensuring that all replicas have fresh data. However, there is a RDBMS that scales horizontally - MySQL Cluster. From the docs:

MySQL Cluster automatically shards (partitions) tables across nodes, enabling databases to scale horizontally on low cost..

Auto-Sharding in MySQL Cluster

Unlike other sharded databases, users do not lose the ability to perform JOIN operations, sacrifice ACID-guarantees or referential integrity (Foreign Keys) when performing queries and transactions across shards.

In my company, We have been using MySQL Cluster for quite some time and it really works well (and scales horizontally). There is also Citus (recently released) that is built on the top of PostgreSQL, but haven't tried it personally.

回答2:

The answer is "CAP Theorem"

You can have at most 2 of Consistency, Availability or Partition Tolerance but typically it boils down to

(Consistency OR availability) AND Partition Tolerance

Database systems designed with traditional ACID guarantees in mind such as RDBMS choose consistency over availability, whereas systems designed around the BASE philosophy, common in the NoSQL movement for example, choose availability over consistency.[6]

With NoSQL if a node drops out the system stays up, but you may not get the latest data. This of course is a huge no-no in, say, banking or billing systems. But in a Social Media application it is of no consequence.

More examples

http://blog.flux7.com/blogs/nosql/cap-theorem-why-does-it-matter
https://dzone.com/articles/understanding-the-cap-theorem
https://codahale.com/you-cant-sacrifice-partition-tolerance/

From this site

CAP theorem - Availability and Partition Tolerance

来源：https://stackoverflow.com/questions/48825977/what-keeps-relational-databases-from-horizontal-scaling

标签

relational-database

san