Ok, dumb question I know but I see the nebulous comment \'a large database\' as well as small and medium and I wonder just what that means. Can someone define what a small, medi
One way to figure it is by observing your test queries.
A small database is one where indexes don't matter.
A medium database is one where queries take longer than one second if you don't have an appropriate index in place.
A big database is one where queries often take hours to optimize, using a combination of query design, index modification, and many test cycles.
Large database are ones that force you have to stop using relational databases.
In other words, a normalized, relational database where all the indexes in the world can't help you meet your response time requirements because of the massive JOINs.
If you've ever had to abandon relational databases for something else, you're either a poor database developer, have no expert DBA, or have a very large database.
You have to account for hardware advancement for this definition:
Small database: working set fits into the physical RAM of a single commodity server (about 16GB now)
Medium database: fits into a single or several (through RAID) commodity hard drives on a single machine (up to several TBs now)
Large database: Data needs to distributed across multiple commodity servers in order to fit (up to several PBs now.)
There isn't a threshold where a small database becomes medium or a medium database becomes large. Generally, when I hear these terms, I think of particular orders of magnitude in terms of total records being stored.
As poster dkretz suggested, you could also think about it in terms of the properties each kind of database has. Categorizing it this way, I'd say:
Small: Performance is not a concern. Your queries run fine without making any special optimizations. You see only a marginal performance difference when using front-line enhancements like indexes.
Medium: Your database probably has one or more staff that are assigned part-time to its maintenance and care. These people pay attention to the database's health; their primary administrative responsibility is to prevent unacceptable performance problems and minimize downtime.
Large: Probably has dedicated staff member(s) whose job is to work on the database and improve performance, as well as make sure that application changes don't cause schema breakage over the lifetime of the database. Metrics about the health and status of the database are monitored closely. Significant expertise is required to understand and perform optimizations.
Very large: The database stores vast amounts of information that must be readily accessible. Performance optimizations are absolutely required to wring every last ounce of speed out of each queries, and without it, the database would be much less usable or even impossible to use. The database may be using sophisticated or innovative replication or clustering techniques, pushing the boundaries of current technology.
Note that these are entirely subjective, and that someone may very well have a perfectly legitimate alternate definition of "large".
According to wikipedia article on Very Large Database
A very large database, or VLDB, is a database that contains an extremely high number of tuples (database rows), or occupies an extremely large physical filesystem storage space. The most common definition of VLDB is a database that occupies more than 1 terabyte or contains several billion rows, although naturally this definition changes over time.
If you have a database that is large enough that you can't just "back it up" to put on a development or test box, you likely have a "large database".