I need to fit in additional data into a database, and I have a choice between modifying an existing table (table_existing) or creating new tables.
This is how table_
I am a SQL server DBA so I will sugggest what I would do in SQL Server 2008.
Add the columns to the existing table as nullable marking the columns as SPARSE. Using the sparse tag will not increase the storage for the extra columns in the existing table pages and still allow you to query the sparse columns as columns. SQL Server stores sparse columns internally in XML format which may also be queried or displayed.
If there are legacy apps which can not handle the new table structure
If you have a version which does not support sparse columns build a single child table for your existing table linking the child to the parent with the ID of the parent table. Create a view across the two tables to present the data.
I would agree with DVK that if you opt for (B) you will end up having to query against several tables to get all of your original Field1 values, let alone complexity of JOINs etc. This wouldnt make sense unless the split into separate tables also corresponded to separation into different entities.
I agree with Paul in that your question can't really be answered without knowing the details of the entities involved and the sorts of queries and updates you will be running.
Are your queries more likely to need to combine rows fro (XX,1) set with (YY,2) set etc...?
If not, then splitting into separate tables is faster, since the individual tables used for all queries are narrower.
If you combine them, they might be marginally slower since you'd need UNIONs which will require duplicate queries against main table.
I remember having these doubts before.
From a data validation perspective, option (B) turns out to be more favorable. You can place constraints on the fields better. This is precisely why you would want to split, say, a users
table into students
, teachers
, etc to enforce the NOT NULL constraints depending on the role of the user.
Generally, having a lot of NULL values in your table is bad for performance because of indexing problems.
As a rule of thumb, as long as the number of tables involved in your joins is 4 or less, you don't have to worry about a performance hit.
Edit: If you're worried about the number of tables in your database, I suggest you look here.
What is the more optimal database structure from the speed point of view?
Well, what is correct, best practice, etc, is called Normalisation. If you do that correctly, there will be no optional columns (not fields), no Nulls. The optional columns will be in a separate table, with fewer rows. Sure, you can arrange the tables so that they are sets of optional columns, rather than (one PK plus) one column each.
Combining the rows from the sub-tables into one 5NF row is easy, do that i a view (but do not update via the view, do that directly to each sub-table, via a transactional stored proc).
More, smaller tables, are the nature of a Normalised Relational database. Get used to it. Fewer, larger tables are slower, due to lack of normalisation, duplicates and Nulls. Joining is cumbersome in SQL< but that is all we have. There is no cost in joins themselves, only it the tables being joined (rows, row width, join columns, datatypes, mismatches, indices [or not] ). Databases are optimised for Normalised tables, not for data heaps. And large numbers of tables.
Which happens to be optimal re performance, no surprise. For two reasons:
The tables are narrower, so there are more rows per page, you get more rows per physical I/O, and more rows in the same cache space.
Since you have No Nulls, those columns are fixed len, no unpacking to extract the contents of the column.
There are no pros for large tables with many optional (null) columns, only cons. There never is a pro for breaching standards.
The answer is unchanged regardless of whether you are contemplating 4 or 400 new tables.