MySQL and NoSQL: Help me to choose the right one

前端未结

关注

 5  912

予麋鹿 2020-11-22 03:54

There is a big database, 1,000,000,000 rows, called threads (these threads actually exist, I\'m not making things harder just because of I enjoy it). Threads has only a few

5条回答

死守一世寂寞 (楼主)

2020-11-22 04:20
You should read the following and learn a little bit about the advantages of a well designed innodb table and how best to use clustered indexes - only available with innodb !

http://dev.mysql.com/doc/refman/5.0/en/innodb-index-types.html

http://www.xaprb.com/blog/2006/07/04/how-to-exploit-mysql-index-optimizations/

then design your system something along the lines of the following simplified example:

Example schema (simplified)

The important features are that the tables use the innodb engine and the primary key for the threads table is no longer a single auto_incrementing key but a composite clustered key based on a combination of forum_id and thread_id. e.g.
```
threads - primary key (forum_id, thread_id)

forum_id    thread_id
========    =========
1                   1
1                   2
1                   3
1                 ...
1             2058300  
2                   1
2                   2
2                   3
2                  ...
2              2352141
...
```
Each forum row includes a counter called next_thread_id (unsigned int) which is maintained by a trigger and increments every time a thread is added to a given forum. This also means we can store 4 billion threads per forum rather than 4 billion threads in total if using a single auto_increment primary key for thread_id.
```
forum_id    title   next_thread_id
========    =====   ==============
1          forum 1        2058300
2          forum 2        2352141
3          forum 3        2482805
4          forum 4        3740957
...
64        forum 64       3243097
65        forum 65      15000000 -- ooh a big one
66        forum 66       5038900
67        forum 67       4449764
...
247      forum 247            0 -- still loading data for half the forums !
248      forum 248            0
249      forum 249            0
250      forum 250            0
```
The disadvantage of using a composite key is that you can no longer just select a thread by a single key value as follows:
```
select * from threads where thread_id = y;
```
you have to do:
```
select * from threads where forum_id = x and thread_id = y;
```
However, your application code should be aware of which forum a user is browsing so it's not exactly difficult to implement - store the currently viewed forum_id in a session variable or hidden form field etc...

Here's the simplified schema:
```
drop table if exists forums;
create table forums
(
forum_id smallint unsigned not null auto_increment primary key,
title varchar(255) unique not null,
next_thread_id int unsigned not null default 0 -- count of threads in each forum
)engine=innodb;


drop table if exists threads;
create table threads
(
forum_id smallint unsigned not null,
thread_id int unsigned not null default 0,
reply_count int unsigned not null default 0,
hash char(32) not null,
created_date datetime not null,
primary key (forum_id, thread_id, reply_count) -- composite clustered index
)engine=innodb;

delimiter #

create trigger threads_before_ins_trig before insert on threads
for each row
begin
declare v_id int unsigned default 0;

  select next_thread_id + 1 into v_id from forums where forum_id = new.forum_id;
  set new.thread_id = v_id;
  update forums set next_thread_id = v_id where forum_id = new.forum_id;
end#

delimiter ;
```
You may have noticed I've included reply_count as part of the primary key which is a bit strange as (forum_id, thread_id) composite is unique in itself. This is just an index optimisation which saves some I/O when queries that use reply_count are executed. Please refer to the 2 links above for further info on this.

Example queries

I'm still loading data into my example tables and so far I have a loaded approx. 500 million rows (half as many as your system). When the load process is complete I should expect to have approx:
```
250 forums * 5 million threads = 1250 000 000 (1.2 billion rows)
```
I've deliberately made some of the forums contain more than 5 million threads for example, forum 65 has 15 million threads:
```
forum_id    title   next_thread_id
========    =====   ==============
65        forum 65      15000000 -- ooh a big one
```
Query runtimes
```
select sum(next_thread_id) from forums;

sum(next_thread_id)
===================
539,155,433 (500 million threads so far and still growing...)
```
under innodb summing the next_thread_ids to give a total thread count is much faster than the usual:
```
select count(*) from threads;
```
How many threads does forum 65 have:
```
select next_thread_id from forums where forum_id = 65

next_thread_id
==============
15,000,000 (15 million)
```
again this is faster than the usual:
```
select count(*) from threads where forum_id = 65
```
Ok now we know we have about 500 million threads so far and forum 65 has 15 million threads - let's see how the schema performs :)
```
select forum_id, thread_id from threads where forum_id = 65 and reply_count > 64 order by thread_id desc limit 32;

runtime = 0.022 secs

select forum_id, thread_id from threads where forum_id = 65 and reply_count > 1 order by thread_id desc limit 10000, 100;

runtime = 0.027 secs
```
Looks pretty performant to me - so that's a single table with 500+ million rows (and growing) with a query that covers 15 million rows in 0.02 seconds (while under load !)

Further optimisations

These would include:
- partitioning by range
- sharding
- throwing money and hardware at it
etc...

hope you find this answer helpful :)
0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...

MySQL and NoSQL: Help me to choose the right one

Example schema (simplified)

Example queries

Query runtimes

Further optimisations