问题
This question is meant to serve as a list of databases and their configurations that the major web sites use and would be a great reference for anyone thinking of scaling their web site to the size of Twitter, Facebook or even Google.
Please keep your answers to a minimum and be sure to cite any sources used.
EDIT:
Also, please bold both the web-site name and the database for easier scanning.
回答1:
Facebook.com
- MySQL with MyRocks. Used to store user info and social activities such as likes, comments, and shares.
- Hive (Data warehouse for Hadoop, supports tables and a variant of SQL called hiveQL). Used for "simple summarization jobs, business intelligence and machine learning and many other applications"
- Cassandra (Multi-dimensional, distributed key-value store). Currently used for Facebook's private messaging.
Currently running 610 (soon to be 1000) Hadoop nodes in a single cluster with Hive datastore. Both Hive and Cassandra have been open-sourced by Facebook.
Facebook stats:
- More than 200 million active users
- More than 100 million users log on to Facebook at least once each day
- More than 30 million users update their statuses at least once each day
- Average user has 120 friends on the site
Sources:
- http://www.dbms2.com/2009/05/11/facebook-hadoop-and-hive/
- http://www.facebook.com/note.php?note_id=89508453919
- http://www.facebook.com/press/info.php?statistics
- http://hadoop.apache.org/hive/
- http://wiki.apache.org/hadoop/Hive/Design
- http://www.facebook.com/note.php?note_id=24413138919
- https://code.facebook.com/posts/190251048047090/myrocks-a-space-and-write-optimized-mysql-database
回答2:
Stack Overflow - SQL Server.
Jeff Atwood wrote a nice blog post on this
https://blog.stackoverflow.com/2008/09/what-was-stack-overflow-built-with/
回答3:
LinkedIn.com
- Oracle (Relational Database)
- MySQL (Relational Database)
Databases replicated on multiple servers for high availability. Each specific Service uses its own domain-specific DB.
LinkedIn stats:
- 22 million members
- 4+ million unique visitors/month
- 40 million page views/day
- 2 million searches/day
Sources:
- http://hurvitz.org/blog/2008/06/linkedin-architecture/
回答4:
Flickr uses MySQL.
YouTube uses MySQL but they are moving to Google's BigTable.
Myspace uses SQL Server.
Wikipedia uses MySQL.
回答5:
Microsoft.com
- SQL Server (no surprise there)
Microsoft.com stats:
- 250 million unique visits/month.
- 70 million page views/day.
- 15,000 connections/second.
- Maintains an average of 35,000 concurrent connections to a total of 80 Web servers.
Sources:
- http://technet.microsoft.com/en-us/mscomops/default.aspx
回答6:
Yahoo.com
- PostgreSQL (modified) - A client can connect to any of the nodes in the cluster (or a policy restricted subset). A query flows from the client to the server it chose to connect with. The SQL compiler on that node compiles and optimizes the query on that single node (no parallelism).
Yahoo.com stats:
- 24 billion events a day
- 2-petabyte, claims largest database (Mar 2008)
Source:
- http://perspectives.mvdirona.com/2008/05/23/PetascaleSQLDBAtYahoo.aspx
- http://www.computerworld.com/s/article/9087918/Size_matters_Yahoo_claims_2_petabyte_database_is_world_s_biggest_busiest
回答7:
Twitter.com
- MySQL (Relational Database).
- Cassandra (Multi-dimensional, distributed key-value store). Twitter is just "beginning to use Cassandra at Twitter" (see second source).
In May 2008, Twitter had 1 MySQL instance for writes with multiple MySQL slave instances for reads.
Twitter stats:
- Total Users: 1+ million
- Total Active Users: 200,000 per week
- Total Twitter Messages: 3 million/day
- 5% of Twitter users account for 75% of all activity
- 72.5% of all users joining during the first five months of 2009
Sources:
- http://blog.twitter.com/2008/05/its-not-rocket-science-but-its-our-work.html
- http://blog.evanweaver.com/articles/2009/07/06/up-and-running-with-cassandra/
- http://www.sysomos.com/insidetwitter/
- http://www.techcrunch.com/2008/04/29/end-of-speculation-the-real-twitter-usage-numbers/
回答8:
Digg
- MySQL (Relational Database) for scaling out reads
- MemcacheDB (Key-Value Store) for scaling out writes
Both data stores are distributed across multiple servers.
Digg stats:
- 30M users
- 26M uniques per month
- 2 billion requests a month
- 13,000 requests a second, peak at 27,000 requests a second.
Sources:
- http://www.krisjordan.com/2008/09/18/joe-stump-scaling-digg-and-other-web-applications/
- http://highscalability.com/scaling-digg-and-other-web-applications
回答9:
Google uses BigTable: http://research.google.com/archive/bigtable.html
回答10:
PlentyOfFish.com using Microsoft SQL Server:
https://blog.codinghorror.com/scaling-up-vs-scaling-out-hidden-costs/
来源:https://stackoverflow.com/questions/1113381/what-databases-do-the-world-wide-webs-biggest-sites-run-on