问题
I am new to NoSQL and trying to understand it's meaning.
I have seen many articles in many different websites that repeat the fact that "SQL DataBases are scaled vertically (by adding CPU/memory) whereas NoSQL DataBases are scaled horizontally (by adding more machines that can perform distributed calculations)".
For example these articles:
http://dataconomy.com/sql-vs-nosql-need-know/
http://www.thegeekstuff.com/2014/01/sql-vs-nosql-db/
The thing is that I can't understand why.
As far as I am aware, the major difference between SQL and NoSQL (besides the scalability issue) is that SQL is stored in tables, whereas NoSQL is stored in different ways (Key-Value/Graph/xml, etc..).
I can't seem to understand the connection between those two facts (scalability and storing strategy). These seem like unrelated things to me (probably due to lack of understanding).
回答1:
This is too long for a comment, and I admit that it contains opinions.
The articles are generally reasonable. Both NoSQL technologies and SQL technologies (for lack of a better term) have important roles to play nowadays --- as both articles point out. The discussion is somewhat reminiscent of hierarchical databases versus relational databases, once upon a time.
I disagree with the scalability differences. The discussions leave out technologies such as Hive, PrestoDB, and BigQuery, which are based on highly scalable technologies in the spirit of traditional RDBMSs.
The major differences between RDBMS and NoSQL (in my opinion) are ACID-compliance and data structure. The first is a "burden" that relational databases carry, for both better and worse -- definitely handy for financial transactions, but at the cost of overhead for other purposes. The second is an area where traditional databases are moving towards better handling of unstructured data, with direct support for nested tables, JSON, and XML formats. However, structure is important, as legions of data scientists probably learn the hard way as they interact with data.
Large scalable key-value databases have been designed with "horizontal" scalability in mind. That combined with the lack of pure ACID properties facilitates re-balancing the data for new hardware -- assuming you have designed the database correctly (and that can be a large assumption).
Databases such as Oracle, DB2, and Teradata have supported parallel processing literally for decades (although more biased toward a single server, albeit with shared-nothing architecdtures). Their technology pre-dates the more modern Apache-based systems (for lack of a better term), but it doesn't mean that they cannot scale across multiple processors.
New databases such as Hive, Redshift, BigQuery, and PrestoDB provide SQL-based interfaces in the more modern "horizontally" scalable sense (at least for queries). A lot of work is going on in the Postgres world to support parallel processing there -- and the example of databases such as Greenplum, Netezza, Vertica, and so on belie the idea that relational databases do not scale across multiple independent processors.
来源:https://stackoverflow.com/questions/33186945/why-is-sql-vertically-scalable-and-nosql-horizontally