what is a good way to horizontal shard in postgresql

后端 未结 4 982
感情败类
感情败类 2020-12-23 02:24

what is a good way to horizontal shard in postgresql

1. pgpool 2
2. gridsql

which is a better way to use sharding

also is it possibl

相关标签:
4条回答
  • 2020-12-23 03:02

    Well, if the question is about sharding, then pgpool and postgresql partitioning features are not valid answers.

    Partitioning assumes the partitions are on the same server. Sharding is more general and is usually used when the database is split on several servers. Sharding is used when Partitioning is not possible any more, e.g for large database that cannot fit on a single disk.

    For true sharding then Skype's pl/proxy is probably the best.

    0 讨论(0)
  • 2020-12-23 03:07

    Best practice to achieve PostgreSQL cluster is using:

    1. PostgreSQL Partition (range or list).
    2. Combine PostgreSQL partition and tablespace in several SSD.
    3. PostgreSQL FDW extension.

    Alternative: Postgres-XL

    For Sharding (loadbalance) you can use:

    1. Postgres-BDR
    2. Postgres-X2

    Note:

    Cluster purpose is contain big dataset and mostly for data warehouse.

    Sharding purpose is for loadbalance and mostly used for high-transaction database.

    ** WARNING **

    avoid pgpool because too many overhead that will lead issue in the future.

    Hope this answer will help you in future development.

    0 讨论(0)
  • 2020-12-23 03:09

    pl/proxy (by Skype) is a good solution for this. It requires your access to be through a function API, but once you have that it can make it pretty transparent.

    0 讨论(0)
  • 2020-12-23 03:12

    PostgreSQL allows partitioning in two different ways. One is by range and the other is by list. Both use table inheritance to do partition.
    Partitioning by range, usually a date range, is the most common, but partitioning by list can be useful if the variables that is the partition are static and not skewed.

    Partitioning is done with table inheritance so the first thing to do is set up new child tables.

    CREATE TABLE measurement (
        x        int not null,
        y        date not null,
        z        int
    );
    
    CREATE TABLE measurement_y2006 ( 
        CHECK ( logdate >= DATE '2006-01-01' AND logdate < DATE '2007-01-01' )
    ) INHERITS (measurement);
    
    CREATE TABLE measurement_y2007 (
        CHECK ( logdate >= DATE '2007-01-01' AND logdate < DATE '2008-01-01' ) 
    ) INHERITS (measurement);
    

    Then either rules or triggers need to be used to drop the data in the correct tables. Rules are faster on bulk updates, triggers on single updates as well as being easier to maintain. Here is a sample trigger.

    CREATE TRIGGER insert_measurement_trigger
        BEFORE INSERT ON measurement
        FOR EACH ROW EXECUTE PROCEDURE measurement_insert_trigger();
    

    and the trigger function to do the insert

    CREATE OR REPLACE FUNCTION measurement_insert_trigger()
    RETURNS TRIGGER AS $$
    BEGIN
        IF ( NEW.logdate >= DATE '2006-01-01' 
             AND NEW.logdate < DATE '2007-01-01' ) THEN
            INSERT INTO measurement_y2006 VALUES (NEW.*);
        ELSIF ( NEW.logdate >= DATE '2007-01-01' 
                AND NEW.logdate < DATE '2008-01-01' ) THEN
            INSERT INTO measurement_y2006m03 VALUES (NEW.*);
        ELSE
            RAISE EXCEPTION 'Date out of range.';
        END IF;
        RETURN NULL;
    END;
    $$
    LANGUAGE plpgsql;
    

    These examples are simplified versions of the postgresql documentation for easier reading.

    I am not familiar with pgpool2, but gridsql is a commercial product designed for EnterpriseDB, a commercial database that is built on top of postgresql. Their products are very good, but I do not think that it will work on standard postgresl.

    0 讨论(0)
提交回复
热议问题