我天真的认为,mysql 的 merge 引擎分表模式下,mysql 会自动启用多线程对旗下的子表进行并发检索,但到目前为止我是没找到 mysql 的此机制设置,简单测试下就会发现....mysql 依然是依次检索各个子表,花费的时间与未分表情况下无异
mysql 在单表数据量达到千万级别时,最好采用划分成若干个小单位的优化方案,分区或者分表,这里我们讲分表。
场景
主单表:data 400000w
子分表:data_1 100000w,data_2 100000w,data_3 100000w,data_4 100000w
module/mysql.py是我自己封装的mysql DAO
mysql_task.py是我们这次的测试本,里面启用了四个子线程去检索四个分表,主线程自己去检索主单表
module/mysql.py
#! /usr/bin/env python
# -*-coding:utf-8-*-
"""
mysql DAO
简单的写一下 大家就别在意代码有些依赖注入的问题了
"""
import MySQLdb
class Mysql():
def __init__(self, host, user, passwd, db, port = 3306):
self.host = host
self.user = user
self.passwd = passwd
self.db = db
self.port = port
self.connect()
def connect(self):
self.conn = MySQLdb.connect(host = self.host, user = self.user, passwd = self.passwd, db = self.db, port = self.port)
self.cursor = self.conn.cursor()
def execute(self, sql):
result = self.cursor.execute(sql)
return result
def query(self, sql):
self.cursor.execute(sql)
result = self.cursor.fetchall()
return result
def scaler(self, sql):
self.cursor.execute(sql)
result = self.cursor.fetchone()
return result[0]
def one(self, sql):
self.cursor.execute(sql)
result = self.cursor.fetchone()
return result
def __del__(self):
self.cursor.close()
self.conn.close()
module/__init__.py 模块化编程 不了解的自补一下
#! /usr/bin/env python
# -*-coding:utf-8-*-
"""
将Mysql类作为模块开注册到module中
"""
#将mysql.py中的Mysql注册到module模块中
#这样我们在外部使用 from module import Mysql时即可访问此类
from mysql import Mysql
mysq_task.py
#! /usr/bin/env python
"""
"""
__author__ = 'sallency'
from module import Mysql
from threading import Thread
import time
result = []
class MyThread(Thread):
def __init__(self):
Thread.__init__(self)
def run(self):
global result
dbCon = Mysql('localhost', 'root', '123456', 'mydb')
result.append(dbCon.scaler("select sql_no_cache count(`id`) from `data_" + str(self.no) +"` where `title` like '%hello%'"))
#start sub thread
def task():
thr_1 = MyThread()
thr_2 = MyThread()
thr_3 = MyThread()
thr_4 = MyThread()
thr_1.start()
thr_2.start()
thr_3.start()
thr_4.start()
thr_1.join()
thr_2.join()
thr_3.join()
thr_4.join()
return True
if __name__ == "__main__":
print ""
print "...... multi thread query start ......"
print time.ctime() + ' / ' + str(time.time())
task()
print result
print time.ctime() + ' / ' + str(time.time())
print "...... multi thread query end ......"
print ""
dbCon = Mysql('localhost', 'root', '123456', 'mydb')
print "...... single thread query start ......"
print time.ctime() + ' / ' + str(time.time())
print dbCon.scaler("select sql_no_cache count(`id`) from `data` where `title` like '%hello%'")
print time.ctime() + ' / ' + str(time.time())
测试结果
查询结果:219 + 207 + 156 + 254 == 836 true 啊
多线程用时 1.8 秒,单线程 6.12 秒, 性能提升了 70.59%
这个大家即便不写代码自己想也肯定能得到正确的结论,不过嘛,自己动手搞一下感觉还是挺不错的,哈哈
来源:oschina
链接:https://my.oschina.net/u/252076/blog/677861