Database partitioning - Horizontal vs Vertical - Difference between Normalization and Row Splitting?

前端 未结 4 1745
有刺的猬
有刺的猬 2021-01-30 04:01

I am trying to grasp the different concepts of Database Partitioning and this is what I understood of it:

Horizontal Partitioning/Sharding: Spl

4条回答
  •  时光取名叫无心
    2021-01-30 04:54

    Partitioning is a rather general concept and can be applied in many contexts. When it considers the partitioning of relational data, it usually refers to decomposing your tables either row-wise (horizontally) or column-wise (vertically).

    Vertical partitioning, aka row splitting, uses the same splitting techniques as database normalization, but ususally the term (vertical / horizontal) data partitioning refers to a physical optimization whereas normalization is an optimization on the conceptual level.

    Since you ask for a simple demonstration - assume you have a table like this:

    create table data (
        id integer primary key, 
        status char(1) not null, 
        data1 varchar2(10) not null, 
        data2 varchar2(10) not null);
    

    One way to partition data vertically: Split it as follows:

    create table data_main (
        id integer primary key,
        status char(1) not null,
        data1 varchar2(10) not null );
    
    create table data_rarely_used (
        id integer primary key,
        data2 varchar2(10) not null,
        foreign key (id) references data_main (id) );
    

    This kind of partitioning can be applied, for example, when you rarely need column data2 in your queries. Partition data_main will take less space, hence full table scans will be faster and it is more likely that it fits into the DBMS' page cache. The downside: When you have to query all columns of data, you obivously have to join the tables, which will be more expensive that querying the original table.

    Notice you are splitting the columns in the same way as you would when you normalize tables. However, in this case data could already be normalized to 3NF (and even BCNF and 4NF), but you decide to further split it for the reason of physical optimization.

    One way to partition data horizontally, using Oracle syntax:

    create table data (
        id integer primary key, 
        status char(1), 
        data1 varchar2(10), 
        data2 varchar2(10) )
        partition by list (status) ( 
           partition active_data values ( 'A' ),
           partition other_data values(default) 
        );
    

    This would tell the DBMS to internally store the table data in two segments (like two tables), depending on the value of the column status. This way of partitioning data can be applied, for example, when you usually query only rows of one partition, e.g., the status 'A' rows (let's call them active rows). Like before, full scans will be faster (particularly if there are only few active rows), the active rows (and the other rows resp.) are stored contiguously (they won't be scattered around pages that they share with rows of a different status value, and it is more likely that the active rows will be in the page cache.

提交回复
热议问题