partitioning or separating a very large table in mysql

前端未结

关注

 1  1799

We have a very large table in mysql with 500,000,000 records in it with 100 requests ( SELECT ) per second.
This is schema:

id(int), 
user_id (in

相关标签:

1条回答

不要未来只要你来

2021-01-14 06:06
Your edit says you use queries like this at a rate of a third of a million per hour.
```
 SELECT content,user_id 
   FROM log
   JOIN users ON users.id = log.user_id
  WHERE date > DATE_SUB(CURDATE(), INTERVAL 180 DAY)
  LIMIT 15
```
I will take the liberty of rewriting this query to fully qualify your column selections.
```
 SELECT log.content,
        log.user_id 
   FROM log                                  /* one half gigarow table */
   JOIN users ON users.id = log.user_id      /* two megarow table */
  WHERE log.date > DATE_SUB(CURDATE(), INTERVAL 180 DAY)
  LIMIT 15
```
(Please consider updating your question if this is not correct.)

Why are you joining the users table in this query? None of your results seem to come from it. Why won't this query do what you need?
```
 SELECT log.content,
        log.user_id 
   FROM log                                  /* one half gigarow table */
  WHERE log.date > DATE_SUB(CURDATE(), INTERVAL 180 DAY)
  LIMIT 15
```
If you want to make this query faster, put a compound covering index on (date,user_id, content). This covering index will support a range scan and fast retrieval. If your content column is in fact of type TEXT (a LOB) type, you need to put just (date,user_id) into the covering index, and your retrieval will be a little slower.

Are you using the JOIN to ensure that you get log entries returned which have a matching entry in users? If so, please explain your query better.

You definitely can partition your table based on date ranges. But you will need to either alter your table, or recreate and repopulate it, which will incur either downtime or a giant scramble.

http://dev.mysql.com/doc/refman/5.6/en/partitioning-range.html

Something like this DDL should then do the trick for you
```
CREATE TABLE LOG (
  id         INT NOT NULL AUTO_INCREMENT,  /*maybe BIGINT? */
  user_id    INT NOT NULL,
  `date`     DATETIME NOT NULL,
  content    TEXT,
  UNIQUE KEY (id, `date`),
  KEY covering (`date`,user_id)
) 
PARTITION BY RANGE COLUMNS(`date`) (
    PARTITION p0 VALUES LESS THAN ('2012-01-01'),
    PARTITION p1 VALUES LESS THAN ('2012-07-01'),
    PARTITION p2 VALUES LESS THAN ('2013-01-01'),
    PARTITION p3 VALUES LESS THAN ('2013-07-01'),
    PARTITION p4 VALUES LESS THAN ('2014-01-01'),
    PARTITION p5 VALUES LESS THAN ('2014-07-01'),
    PARTITION p6 VALUES LESS THAN ('2015-01-01'),
    PARTITION p7 VALUES LESS THAN ('2015-07-01')
);
```
Notice that there's some monkey business about the UNIQUE KEY. The column that goes into your partitioning function needs also to appear in the so-called primary key.

Later on, when July 2015 (partition p7's cutoff date) draws near, you can run this statement to add a partition for the next six month segment of time.
```
   ALTER TABLE `log` 
 ADD PARTITION (PARTITION p8 VALUES LESS THAN ('2016-01-01'))
```
But, seriously, none of this partitioning junk is going to help much if your queries have unnecessary joins or poor index coverage. And it is going to make your database administration more complex.
0 讨论(0)
发布评论:

提交评论
- 加载中...