Ok, dumb question I know but I see the nebulous comment \'a large database\' as well as small and medium and I wonder just what that means. Can someone define what a small, medi
I think something like wikipedia, or the US census data is a 'big' database. My personal address lists or todos is a small database. A middle sized database is something in between.
You could try and define the sizes by how many servers you needed. A small database is a component of an application you run on your desktop, a mid-sized database would be a single mysql (whatever) server somewhere, and a large database is going to require multiple servers with some kind of replication/failover support.
“Large Database” is indeed a nebulous concept. There are already very different answers and opinions posted in the answers to this question. Some approaches to define “small”, “medium” and “large” Databases may make more sense than others BUT THEN, at some point, I consider each definition is right, true and valid.
Some definitions make more sense than others because they focus on different aspects of importance for the design, programming, use, maintenance and administration of a Database and these different aspects are what really matter for a usable Database. It just happens that all these aspects are impacted by the nebulous concept of “Database size”.
So, Does this mean that it does not matter if you are able to define if a particular Database is big or not?
Certainly not. What it mean is you will apply the concept differently while evaluating different design/operational/administrative aspects of your Database. It also means that every time this concept will be nebulous.
As an example: Database Index strategy (an aspect of Database design) is impacted by record count for each table (a measure of “size”), by record size times record count (another measure of “size”), and by Query Vs. Creation/Update/Delete operations ratio (an aspect of Database usage).
Query response times are better if indexes are used for tables with large amount of records. Depending on the nature of your WHERE, ORDER BY and record-aggregation clauses you may need several indexes for certain tables.
Creation, Update and Delete operations are impacted negatively with the increase of number of indexes on the affected table(s). More indexes for an affected table means more changes that the RDBMS must perform, spending more time and more resources to apply those changes.
Also, if your RDBMS spends more time to apply those changes, then the locks are maintained for longer times also, impacting the response times other queries being sent to the system at the same time.
So, How do you balance the quantity and design of your indexes? How do you know if you need an additional index and if by adding that index you will not be introducing a big negative impact on query response times? Answer: You test and profile your database against a target load as per your load/performance requirements and analyze the profiling data in order to discover if further optimizations/redesigns/indexes are needed.
Different Index strategies are required for different Query Vs. Creation/Update/Delete operations ratios. If your Database is under a heavy load of queries but is rarely updated, the performance for the overall application will be better if you add every index that improves query response times. On the other hand, if your Database is constantly being updated but there are not large query operations, then the performance will be better if you use less indexes.
There are other aspects of course: Database Schema design, Storage Strategy, Network design, Backup strategy, Stored Procedures/Triggers/Etc. programming, Application Programming (against the Database), Etc. All these aspects are impacted differently by distinct concepts of “size” (record size, record count, index size, index count, schema design, storage size, etc.).
I'd like to have more time as this topic is fascinating. I hope this small contribution serves as an starting point for you in this fascinating world of SQL.