Database that can handle >500 millions rows

后端 未结 9 1518
半阙折子戏
半阙折子戏 2020-12-01 01:23

I am looking for a database that could handle (create an index on a column in a reasonable time and provide results for select queries in less than 3 sec) more than

相关标签:
9条回答
  • 2020-12-01 01:59

    Pretty much every non-stupid database can handle a billion rows today easily. 500 million is doable even on 32 bit systems (albeit 64 bit really helps).

    The main problem is:

    • You need to have enough RAM. How much is enough depends on your queries.
    • You need to have a good enough disc subsystem. This pretty much means if you want to do large selects, then a single platter for everything is totally out of the question. Many spindles (or a SSD) are needed to handle the IO load.

    Both Postgres as well as Mysql can easily handle 500 million rows. On proper hardware.

    0 讨论(0)
  • 2020-12-01 02:00

    I don't have much input on which is the best system to use, but perhaps this tip could help you get some of the speed you're looking for.

    If you're going to be doing exact matches of long varchar strings, especially ones that are longer than allowed for an index, you can do a sort of pre-calculated hash:

    CREATE TABLE BigStrings (
       BigStringID int identity(1,1) NOT NULL PRIMARY KEY CLUSTERED,
       Value varchar(6000) NOT NULL,
       Chk AS (CHECKSUM(Value))
    );
    CREATE NONCLUSTERED INDEX IX_BigStrings_Chk ON BigStrings(Chk);
    
    --Load 500 million rows in BigStrings
    
    DECLARE @S varchar(6000);
    SET @S = '6000-character-long string here';
    
    -- nasty, slow table scan:
    SELECT * FROM BigStrings WHERE Value = @S
    
    -- super fast nonclustered seek followed by very fast clustered index range seek:
    SELECT * FROM BigStrings WHERE Value = @S AND Chk = CHECKSUM(@S)
    

    This won't help you if you aren't doing exact matches, but in that case you might look into full-text indexing. This will really change the speed of lookups on a 500-million-row table.

    0 讨论(0)
  • 2020-12-01 02:06

    Have you checked out Cassandra? http://cassandra.apache.org/

    0 讨论(0)
提交回复
热议问题