问题
I am not so into database and I have the following doubt about the possible optimization of this query (defined on a MySql database):
SELECT MCPS.id AS series_id,
MD_CD.market_details_id AS market_id,
MD_CD.commodity_details_id AS commodity_id,
MD.market_name AS market_name,
MCPS.price_date AS price_date,
MCPS.avg_price AS avg_price,
CU.ISO_4217_cod AS currency,
MU.unit_name AS measure_unit,
CD.commodity_name_en,
CN.commodity_name
FROM Market_Commodity_Price_Series AS MCPS
INNER JOIN MeasureUnit AS MU ON MCPS.measure_unit_id = MU.id
INNER JOIN Currency AS CU ON MCPS.currency_id = CU.id
INNER JOIN MarketDetails_CommodityDetails AS MD_CD ON MCPS.market_commodity_details_id = MD_CD.id
INNER JOIN MarketDetails AS MD ON MD_CD.market_details_id = MD.id
INNER JOIN CommodityDetails AS CD ON MD_CD.commodity_details_id = CD.id
INNER JOIN CommodityName AS CN ON CD.id = CN.commodity_details_id
INNER JOIN Languages AS LN ON CN.language_id = LN.id
WHERE MD.id = 4
AND CD.id = 4
AND LN.id=1
ORDER BY price_date DESC LIMIT 1
It contains a lot of join only because the information in the tables are very normalized.
This query returns the information related the last price of a commodity (AND CD.id = 4) in a specific market (**WHERE MD.id = 4).
So I am first retrieving the list of all prices of a specific commodity in a specific market (that can be a lot of records) and then I do:
ORDER BY price_date DESC LIMIT 1
I think that maybe this way can be computationally expensive because the first part of the query (the entire query except the last ORDER BY statment will retrieve a lot of records that then have to be ordered.
Is it a problem? Eventually what could be a smart strategy to optimize this query? (I have no idea...)
回答1:
Sanity check... are MD_CD.commodity_details_id
and CN.commodity_details_id
different? Or is it a typo?
Get rid of
INNER JOIN Languages AS LN ON CN.language_id = LN.id
...
AND LN.id=1
Instead, simply have AND CN.language_id = 1
Run EXPLAIN SELECT ...
whenever you try a formulation.
It seems that most things are centered around commodity_details_id = 4
. So say AND MD_CD.commodity_details_id = 4
, not MD.id = 4
and CD.id = 4
.
Now, let's try to reformulate it for a little bit of efficiency. Let's start with the main filter -- AND MD_CD.commodity_details_id = 4
, so let's say FROM MD_CD JOIN ...
. Then add JOINs
until we can get to MCPS
.price_dateso we can work on
ORDER BY price_date DESC LIMIT 1`:
SELECT very-few-columns
FROM MD_CD
JOIN ... (the minimal set of tables)
JOIN MCPS ON ...
WHERE MD_CD.commodity_details_id = 4
ORDER BY MCPS.price_date DESC LIMIT 1
Before going further, debug that above to make sure it is providing the correct one row.
To finish out the query, there are two approaches:
Plan A: Use that is a 'derived table' and add more JOINs on:
SELECT lots-of-stuff
FROM ( the above query ) AS x
JOIN the rest of the tables
Plan B: Use subqueries in the SELECT for the missing columns:
SELECT some stuff,
( SELECT commodity_name_en FROM CN
WHERE commodity_details_id = CD.id ) AS commodity_name_en,
etc
FROM ... (as above)
(And heed my advice on many-to-many schema in http://mysql.rjweb.org/doc.php/index_cookbook_mysql#many_to_many_mapping_table )
来源:https://stackoverflow.com/questions/46605178/how-can-i-optimize-a-query-that-returns-a-lot-of-records-then-order-by-a-date-f