问题
I have hierarchical data in a nested set model (table:projects):
My table (projects):
id, lft, rgt
1, 1, 6
2, 2, 3
3, 4, 5
4, 7, 10
5, 8, 9
6, 11, 12
7, 13, 14
...
Pretty printed:
1
2
3
4
5
6
7
To find the nearest super node of node 3 (knowing its lft value), i can do
explain
SELECT projects.*
FROM projects
WHERE 4 BETWEEN projects.lft AND projects.rgt
Which gives me a list of the projects in the path down to node 3. Then by grouping and finding MAX(projects.lft) of the results, i get the nearest super node. However, I cannot seem to get this query to run fast, it wont use the indexes i've defined. EXPLAIN says:
+----+-------------+----------+-------+----------------+----------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+-------+----------------+----------+---------+------+------+--------------------------+
| 1 | SIMPLE | projects | index | lft,rgt,lftRgt | idLftRgt | 12 | NULL | 10 | Using where; Using index |
+----+-------------+----------+-------+----------------+----------+---------+------+------+--------------------------+
Mysql understands what index to use, but still has to loop through all 10 rows (or 100k in my actual table).
How can i get MySql to optimize this query properly? I include a test script beneath.
DROP TABLE IF EXISTS projects;
CREATE TABLE projects (
id INT NOT NULL ,
lft INT NOT NULL ,
rgt INT NOT NULL ,
PRIMARY KEY ( id )
) ENGINE = MYISAM ;
ALTER TABLE projects ADD INDEX lft (lft);
ALTER TABLE projects ADD INDEX rgt (rgt);
ALTER TABLE projects ADD INDEX lftRgt (lft, rgt);
ALTER TABLE projects ADD INDEX idLftRgt (id, lft, rgt);
INSERT INTO projects (id,lft,rgt) VALUES (1,1,6);
INSERT INTO projects (id,lft,rgt) VALUES (2,2,3);
INSERT INTO projects (id,lft,rgt) VALUES (3,4,5);
INSERT INTO projects (id,lft,rgt) VALUES (4,7,10);
INSERT INTO projects (id,lft,rgt) VALUES (5,8,9);
INSERT INTO projects (id,lft,rgt) VALUES (6,11,12);
INSERT INTO projects (id,lft,rgt) VALUES (7,13,14);
INSERT INTO projects (id,lft,rgt) VALUES (8,15,16);
INSERT INTO projects (id,lft,rgt) VALUES (9,17,18);
INSERT INTO projects (id,lft,rgt) VALUES (10,19,20);
explain
SELECT projects.*
FROM projects
WHERE 4 BETWEEN projects.lft AND projects.rgt
回答1:
To optimize nested set queries in MySQL
, you should create a SPATIAL
(R-Tree
) index on the set boxes:
ALTER TABLE projects ADD sets LINESTRING;
UPDATE projects
SET sets = LineString(Point(-1, lft), Point(1, rgt));
ALTER TABLE projects MODIFY sets LINESTRING NOT NULL;
CREATE SPATIAL INDEX sx_projects_sets ON projects (sets);
SELECT hp.*
FROM projects hp
WHERE MBRWithin(Point(0, 4), hp.sets)
ORDER BY
lft;
See this article in my blog for more detail:
- Adjacency list vs. nested sets: MySQL
回答2:
If you can't use the spatial index, then these two indexes:
ALTER TABLE projects ADD INDEX lftRgt (lft, rgt);
ALTER TABLE projects ADD INDEX idLftRgt (id, lft, rgt);
Should be unique. That will help the database a lot.
ALTER TABLE projects ADD INDEX lft (lft);
Is not necessary - it's a duplicate of lftRgt.
回答3:
Came across this while trying to find help on indexing for nested sets.
I landed up with a different solution, which is bulky but easily fully indexed. However it will make updates even slower. However I am posting it here as it might help others.
We have a table of product categories, which can have sub categories, etc. This data is quite static.
I set up a table caching the relationships between categories containing the category and a row for each parent category (including this particular category), along with the difference in depth.
When a change is made to the actual category table I just trigger a procedure to rebuild the cached table.
Then anything that is checking for the parent / child relationship can just use the cache to link directly between a category and all its children (or a child and all its parents).
The actual category table.
CREATE TABLE `category` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(128) NOT NULL,
`depth` int(11) NOT NULL,
`left_index` int(4) NOT NULL,
`right_index` int(4) NOT NULL,
`mmg_code` varchar(30) NOT NULL
PRIMARY KEY (`id`),
UNIQUE KEY `mmg_code` (`mmg_code`),
UNIQUE KEY `left_index_right_index` (`left_index`,`right_index`),
UNIQUE KEY `depth_left_index_right_index` (`depth`,`left_index`,`right_index`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
DELIMITER ;;
CREATE TRIGGER `category_ai` AFTER INSERT ON `category` FOR EACH ROW
CALL `proc_rebuild_category_parents_cache`();;
CREATE TRIGGER `category_au` AFTER UPDATE ON `category` FOR EACH ROW
CALL `proc_rebuild_category_parents_cache`();;
DELIMITER ;
The simple cache table:-
CREATE TABLE `category_parents_cache` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`category_id` int(11) NOT NULL,
`parent_category_id` int(11) NOT NULL,
`depth_difference` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `category_id` (`category_id`),
KEY `parent_category_id` (`parent_category_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
The procedure:-
BEGIN
TRUNCATE category_parents_cache;
INSERT INTO category_parents_cache (id, category_id, parent_category_id, depth_difference)
SELECT NULL,
child_category.id AS category_id,
category.id AS parent_category_id,
child_category.depth - category.depth AS depth_difference
FROM category
INNER JOIN category child_category ON child_category.left_index BETWEEN category.left_index AND category.right_index
ORDER BY category.id, child_category.id;
END
This could probably be usefully improved if the table is large and commonly updated.
来源:https://stackoverflow.com/questions/1743894/mysql-optimizing-finding-super-node-in-nested-set-tree