I\'m currently trying to improve the speed of SELECTS for a MySQL table and would appreciate any suggestions on ways to improve it.
We have over 300 million records
I would do two things - first throw some indexes on there around tag and date as suggested above:
alter table table add index (tag, date);
Next break your query into a main query and sub-select in which you are narrowing your results down when you get into your main query:
SELECT date, value
FROM table
WHERE date BETWEEN 'x' and 'y'
AND tag IN ( SELECT tag FROM table WHERE tag = 'a' )
ORDER BY date
I would guess that adding an index on (tag, date)
would help:
alter table table add index (tag, date);
Please post the result of an explain on this query (EXPLAIN SELECT date, value FROM ......)
I'd say your only chance to further improve it is a covering index with all three columns (tag, data, value). That avoids the table access.
I don't think that partitioning can help with that.
take time to read my answer here: (has similar volumes to yours)
500 millions rows, 15 million row range scan in 0.02 seconds.
MySQL and NoSQL: Help me to choose the right one
then amend your table engine to innodb as follows:
create table tag_date_value
(
tag_id smallint unsigned not null, -- i prefer ints to chars
tag_date datetime not null, -- can we make this date vs datetime ?
value int unsigned not null default 0, -- or whatever datatype you require
primary key (tag_id, tag_date) -- clustered composite PK
)
engine=innodb;
you might consider the following as the primary key instead:
primary key (tag_id, tag_date, value) -- added value save some I/O
but only if value isnt some LARGE varchar type !
query as before:
select
tag_date,
value
from
tag_date_value
where
tag_id = 1 and
tag_date between 'x' and 'y'
order by
tag_date;
hope this helps :)
EDIT
oh forgot to mention - dont use alter table to change engine type from mysiam to innodb but rather dump the data out into csv files and re-import into a newly created and empty innodb table.
note i'm ordering the data during the export process - clustered indexes are the KEY !
Export
select * into outfile 'tag_dat_value_001.dat'
fields terminated by '|' optionally enclosed by '"'
lines terminated by '\r\n'
from
tag_date_value
where
tag_id between 1 and 50
order by
tag_id, tag_date;
select * into outfile 'tag_dat_value_002.dat'
fields terminated by '|' optionally enclosed by '"'
lines terminated by '\r\n'
from
tag_date_value
where
tag_id between 51 and 100
order by
tag_id, tag_date;
-- etc...
Import
import back into the table in correct order !
start transaction;
load data infile 'tag_dat_value_001.dat'
into table tag_date_value
fields terminated by '|' optionally enclosed by '"'
lines terminated by '\r\n'
(
tag_id,
tag_date,
value
);
commit;
-- etc...
What is the cardinality of the date field (that is, how many different values appear in that field)? If the date BETWEEN 'x' AND 'y' is more limiting than the tag = 'a' part of the WHERE clause, try making your primary key (date, tag) instead of (tag, date), allowing date to be used as an indexed value.
Also, be careful how you specify 'x' and 'y' in your WHERE clause. There are some circumstances in which MySQL will cast each date field to match the non-date implied type of the values you compare to.
Try inserting just the needed dates into a temporary table and the finishing with a select on the temporary table for the tags and ordering.
CREATE temporary table foo
SELECT date, value
FROM table
WHERE date BETWEEN 'x' and 'y' ;
ALTER TABLE foo ADD INDEX index( tag );
SELECT date, value
FROM foo
WHERE tag = "a"
ORDER BY date;
if that doesn't work try creating foo off the tag selection instead.
CREATE temporary table foo
SELECT date, value
FROM table
WHERE tag = "a";
ALTER TABLE foo ADD INDEX index( date );
SELECT date, value
FROM foo
WHERE date BETWEEN 'x' and 'y'
ORDER BY date;