SQL Server: Any value in vertical partitioning when i'm always going to re-JOIN them?

问题

i'm faced with having to add 64 new columns to table that already had 32 columns. For examples sake:

Customers
(
    CustomerID int
    Name        varchar(50)
    Address     varchar(50)
    City        varchar(50)
    Region      varchar(50)
    PostalCode  varchar(50)
    Country     varchar(2)
    Telephone   varchar(20)

    ...
    NewColumn1  int null
    NewColumn2  uniqueidentifier null
    NewColumn3  varchar(50)
    NewColumn4  varchar(50)
    ...
    NewColumn64 datetime null

    ...
    CreatedDate datetime
    LastModifiedDate datetime
    LastModifiedWorkstation varchar(50)
    LastModifiedUser varchar(50)
)

Most of the time the majority of these new columns will contain null.

It is also a given that if i vertically partition off these 64 new columns into a new table, then every time i SELECT from Customers:

SELECT ...
FROM Customers

will have to be converted to a join to get the partitioned values (i.e. there is never a performance gain to be had where i don't require the new columns):

SELECT ...
FROM Customers
    INNER JOIN Customers_ExtraColumns
    ON Customers.CustomerID = Customers_ExtraColumns.CustomerID

So that's one con to partitioning off the columns.

The other con is that i have to manage inserting rows into two tables simultaneously, rather than just one.

The final con i can think of is that SQL Server now has to perform an INNER JOIN any time i want to access "Customers". There will now and forever a waste of CPU and I/O to join tables that really are one table - except that i had decided to split them up.

So my question is: why would i split them up?

Is there any value in vertically partitioning out 64 columns to a separate table when they will mostly be null? Null take up very little space....

What are the pros?

Edit: Why am i even considering partitioning? It's mostly null data that will triple the number of columns in the table. Surely it must be bad!

回答1:

For simplicity of data model, without further information, I would probably not partition, but you haven't indicated the nature of the data in these new columns (perhaps some columns are arrays which should be normalized instead).

However, some points:

If you do vertically partition, and have a FK constraint on the supplemental table, that may help eliminate the join in some scenarios, since it knows that one and only one row will exist. Obviously it will be indexed on the same unique keys, which will help to eliminate the need to determine if there is a cross-join, since there can only be 0 or 1 rows.

You can have a single updatable view which joins the two tables and have a trigger on the view which inserts into the two tables joined to make the view. You could also decide to do a left join and only create a supplemental row at all if any of the columns needing it are non-NULL.

You can also use a sparsely joined set of tables of supplemental data. Obviously this would also need joins, but you could also use similar techniques with multiple supplemental tables as you would with 1.

回答2:

If these values are a) unique to a record (a given customer should only have one value which would go in NewColumn1) and b) not used by any other record (at least, no other record that doesn't also require the base customer information) I'd say leave them as one table. Just don't forget to name your specific columns in any queries you write against the table.

I come from an EDI background, and sometimes you have to deal with flatfiles that contain 30+ columns of data per row. As you mention, NULL doesn't take up much room, and if you're NEVER going to be grapping the columns independently (and you'll never be able to grab the base customer data independently), I'd say you've got it right.

回答3:

The answer is in details that were omitted from the question. The number of columns is irrelevant, it is the nature of the data that matters.

First, remember that a given row in any table can never exceed 8060 bytes. So if the new columns are sized such that that limit can theoretically be exceeded, you will have built a time-bomb into the database. Sometime when it is least convenient, a data insert or update will throw an error and/or data will be lost.

To guard against this, you may need to use more than one table, it's just a limitation of most editions of SQL-Server.
.
The other important consideration is data-modeling. Do the new columns have a one-to-one relationship with CustomerID? For example, say eyeColor?

Because of the number of columns and the fact that you omitted their names, I suspect that a non-normalized design is being contemplated. If the new columns are something like WebPage1, WebPage2, WebPage3, etc., then these need to be split into a separate, normalized table.
.

But, if the columns really are unique items, unrelated to each other and with a 1-to-1 relationship to CustomerID (or whatever the primary-key of that table is), and the size limit cannot be busted, then having everything in one table is perfectly fine.

来源：https://stackoverflow.com/questions/3496618/sql-server-any-value-in-vertical-partitioning-when-im-always-going-to-re-join

标签

sql-server-2000

vertical-partitioning