问题
I have a large database with EAV structured data that has to be searchable and pageable. I tried every trick in my book to get it fast enough, but under certain circumstances, it still fails to complete in a reasonable time.
This is my table structure (relevant parts only, ask away if you need more):
CREATE TABLE IF NOT EXISTS `object` (
`object_id` bigint(20) NOT NULL AUTO_INCREMENT,
`oid` varchar(32) CHARACTER SET utf8 NOT NULL,
`status` varchar(100) CHARACTER SET utf8 DEFAULT NULL,
`created` datetime NOT NULL,
`updated` datetime NOT NULL,
PRIMARY KEY (`object_id`),
UNIQUE KEY `oid` (`oid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE IF NOT EXISTS `version` (
`version_id` bigint(20) NOT NULL AUTO_INCREMENT,
`type_id` bigint(20) NOT NULL,
`object_id` bigint(20) NOT NULL,
`created` datetime NOT NULL,
`status` varchar(100) CHARACTER SET utf8 DEFAULT NULL,
PRIMARY KEY (`version_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE IF NOT EXISTS `value` (
`value_id` bigint(20) NOT NULL AUTO_INCREMENT,
`object_id` int(11) NOT NULL,
`attribute_id` int(11) NOT NULL,
`version_id` bigint(20) NOT NULL,
`type_id` bigint(20) NOT NULL,
`value` text NOT NULL,
PRIMARY KEY (`value_id`),
KEY `field_id` (`attribute_id`),
KEY `action_id` (`version_id`),
KEY `form_id` (`type_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
This is a sample object. I have around 1 million of those in my database. each object may have different number of attributes with different attribute_id
INSERT INTO `owner` (`owner_id`, `uid`, `status`, `created`, `updated`) VALUES (1, 'cwnzrdxs4dzxns47xs4tx', 'Green', NOW(), NOW());
INSERT INTO `object` (`object_id`, `type_id`, `owner_id`, `created`, `status`) VALUES (1, 1, 1, NOW(), NOW());
INSERT INTO `value` (`value_id`, `owner_id`, `attribute_id`, `object_id`, `type_id`, `value`) VALUES (1, 1, 1, 1, 1, 'Munich');
INSERT INTO `value` (`value_id`, `owner_id`, `attribute_id`, `object_id`, `type_id`, `value`) VALUES (2, 1, 2, 1, 1, 'Germany');
INSERT INTO `value` (`value_id`, `owner_id`, `attribute_id`, `object_id`, `type_id`, `value`) VALUES (3, 1, 3, 1, 1, '123');
INSERT INTO `value` (`value_id`, `owner_id`, `attribute_id`, `object_id`, `type_id`, `value`) VALUES (4, 1, 4, 1, 1, '2012-01-13');
INSERT INTO `value` (`value_id`, `owner_id`, `attribute_id`, `object_id`, `type_id`, `value`) VALUES (5, 1, 5, 1, 1, 'A cake!');
Now on to my current mechanism. My first try was the typical approach to Mysql. Do one huge SQL with loads of joins on anything I require. Complete desaster! Took way to long to load and even crashed the PHP and MySQL servers due to exhausted RAM.
So I split my queries up into several steps:
1 Determine all needed attribute_ids.
I can look them up in another table that references the type_id of an object. The result is a list of attribute_ids. (this table is not very relevant to the performance, so it's not included in my sample.)
:type_id contains all type_ids from any objects I want to include in my search. I already got this information in my application. So this is inexpensive.
SELECT * FROM attribute WHERE form_id IN (:type_id)
Result is an array of type_id integers.
2 Search for matching objects A big SQL query is compiled that adds one INNER JOIN for each and every condition I want. This sounds horrible, but in the end, it was the fastest method :(
A typical generated query might look like this. The LIMIT sadly is necessary or I will potentially get so many IDs that the resulting array makes PHP explode or break the IN statement in the next Query:
SELECT DISTINCT `version`.object_id FROM `version`
INNER JOIN `version` AS condition1
ON `version`.version_id = condition1.version_id
AND condition1.created = '2012-03-04' -- Filter by version date
INNER JOIN `value` AS condition2
ON `version`.version_id = condition2.version_id
AND condition2.type_id IN (:type_id) -- try to limit joins to object types we need
AND condition2.attribute_id = :field_id2 -- searching for a value in a specific attribute
AND condition2.value = 'Munich' -- searching for the value 'Munich'
INNER JOIN `value` AS condition3
ON `version`.version_id = condition3.version_id
AND condition3.type_id IN (:type_id) -- try to limit joins to object types we need
AND condition3.attribute_id = :field_id3 -- searching for a value in a specific attribute
AND condition3.value = 'Green' -- searching for the value 'Green'
WHERE `version`.type_id IN (:type_id) ORDER BY `version`.version_id DESC LIMIT 10000
The result will contain all object_ids from any object I might need. I am selecting object_ids and not version_ids as I need to have all versions of matching objects, regardless of which version matched.
3 Sort and page results Next I will create a query that sorts the objects by a certain attribute and then pages the resulting array.
SELECT DISTINCT object_id
FROM value
WHERE object_id IN (:foundObjects)
AND attribute_id = :attribute_id_to_sort
AND value > ''
ORDER BY value ASC LIMIT :limit OFFSET :offset
The result is a sorted and paged list of object ids from former search
4 Get our complete objects, versions and attributes In the last step, I will select all values for any objects and versions the former queries found.
SELECT `value`.*, `object`.*, `version`.*, `type`.*
`object`.status AS `object.status`,
`object`.flag AS `object.flag`,
`version`.created AS `version.created`,
`version`.status AS `version.status`,
FROM version
INNER JOIN `type` ON `version`.form_id = `type`.type_id
INNER JOIN `object` ON `version`.object_id = `object`.object_id
LEFT JOIN value ON `version`.version_id = `value`.version_id
WHERE version.object_id IN (:sortedObjectIds) AND `version.type_id IN (:typeIds)
ORDER BY version.created DESC
The result will then be compiled via PHP into nice object->version->value array structures.
Now the question:
- Can this whole mess be accelerated in any way?
- Can I somehow remove the LIMIT 10000 restriction from my search query?
If all else fails, maybe switch database technology? See my other question: Database optimized for searching in large number of objects with different attributes
Real Life samples
Table sizes: object - 193801 rows, version - 193841 rows, value - 1053928 rows
SELECT * FROM attribute WHERE attribute_id IN (30)
SELECT DISTINCT `version`.object_id
FROM version
INNER JOIN value AS condition_d4e328e33813
ON version.version_id = condition_d4e328e33813.version_id
AND condition_d4e328e33813.type_id IN (30)
AND condition_d4e328e33813.attribute_id IN (377)
AND condition_d4e328e33813.value LIKE '%e%'
INNER JOIN value AS condition_2c870b0a429f
ON version.version_id = condition_2c870b0a429f.version_id
AND condition_2c870b0a429f.type_id IN (30)
AND condition_2c870b0a429f.attribute_id IN (376)
AND condition_2c870b0a429f.value LIKE '%s%'
WHERE version.type_id IN (30)
ORDER BY version.version_id DESC LIMIT 10000 -- limit to 10000 or it breaks!
Explain:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE condition_2c870b0a429f ref field_id,action_id,form_id field_id 4 const 178639 Using where; Using temporary; Using filesort
1 SIMPLE action eq_ref PRIMARY PRIMARY 8 condition_2c870b0a429f.action_id 1 Using where
1 SIMPLE condition_d4e328e33813 ref field_id,action_id,form_id action_id 8 action.action_id 11 Using where; Distinct
objects search completed (Peak RAM: 5.91MB, Time: 4.64s)
SELECT DISTINCT object_id
FROM version
WHERE object_id IN (193793,193789, ... ,135326,135324) -- 10000 ids in here!
ORDER BY created ASC
LIMIT 50 OFFSET 0
objects sort completed (Peak RAM: 6.68MB, Time: 0.352s)
SELECT `value`.*, object.*, version.*, type.*,
object.status AS `object.status`,
object.flag AS `object.flag`,
version.created AS `version.created`,
version.status AS `version.status`,
version.flag AS `version.flag`
FROM version
INNER JOIN type ON version.type_id = type.type_id
INNER JOIN object ON version.object_id = object.object_id
LEFT JOIN value ON version.version_id = `value`.version_id
WHERE version.object_id IN (135324,135326,...,135658,135661) AND version.type_id IN (30)
ORDER BY quality DESC, version.created DESC
objects load query completed (Peak RAM: 6.68MB, Time: 0.083s)
objects compilation into arrays completed (Peak RAM: 6.68MB, Time: 0.007s)
回答1:
Just try to add an EXPLAIN before your search query :
EXPLAIN SELECT DISTINCT `version`.object_id FROM `version`, etc ...
then check the results in the "Extra" column, it will give you some clues to speedup your query, like adding INDEX on the right fields.
Also some times you can removeINNER JOIN, get more results in your Mysql response and filter the big array by processing with PHP loops.
回答2:
I would start by trying to have covering indexes (ie: all columns to match the criteria you are querying on and even pulling out as result). This way the engine does not have to go back to the raw page data.
Since you need the "object_id" from version, and using the "version_id" as join basis to the other tables. Your version table also has a WHERE clause on the TYPE_ID, so I would have an index on
version table -- (object_id, version_id, type_id)
For your "value" table, match there too for criteria
value table -- ( version_id, type_id, attribute_id, value, created )
来源:https://stackoverflow.com/questions/21412479/need-mysql-optimization-for-complex-search-on-eav-structured-data