How to optimize the SQL query that joins 3 table

て烟熏妆下的殇ゞ 提交于 2019-12-23 23:42:16

问题


Howdie do,

I have the following SQL query which takes about 4 seconds to run:

select 
    o.id, tu.status_type, m.upload_date
from 
    (select order_id, max(last_updated) as maxudate from tracking_update group by order_id) t
inner join 
    tracking_update tu on t.order_id=tu.order_id and t.maxudate=tu.last_updated
right join 
    fgw247.order o on t.order_id=o.id
left join 
    manifest m on o.manifest_id=m.id
where 
    (m.upload_date >= '2015-12-12 00:00:00') or (m.upload_date <='2015-12-12 00:00:00' and tu.status_type != 'D' and tu.status_type != 'XD')

The query joins the following 3 tables:

Order, Manifest and Tracking_Update.

The query will return orders that meet the following criteria:

  1. Orders < 30 days with any delivery status
  2. Orders > 30 days and not a delivery status

An order is considered delivered if the latest tracking_update for that order has a status_type of 'D' or 'XD'

Now, the orders are listed in the Order table. The Order table has a column called manifest_id which references the Manifest that was created when the order was uploaded.

A manifest can have multiple orders. This is what is used to determine if an order has been uploaded in the past 30 days.

Finally, the tracking_update table contains the tracking_updates per order. An order can have multiple tracking_updates.

Currently, the tracking_update table is over 1M records long.

Listed below are the create statements for each table:

CREATE TABLE "order" (
  "id" int(11) NOT NULL AUTO_INCREMENT,
  "ShipmentId" varchar(50) DEFAULT NULL,
  "RecipientName" varchar(160) DEFAULT NULL,
  "CompanyName" varchar(160) DEFAULT NULL,
  "Address1" varchar(160) DEFAULT NULL,
  "Address2" varchar(160) DEFAULT NULL,
  "City" varchar(50) DEFAULT NULL,
  "State" varchar(3) DEFAULT NULL,
  "ZIP" int(11) DEFAULT NULL,
  "TEL" int(11) DEFAULT NULL,
  "Email" varchar(255) DEFAULT NULL,
  "Bottles" int(11) DEFAULT NULL,
  "Weight" float DEFAULT NULL,
  "Resi" tinyint(1) DEFAULT NULL,
  "Wave" int(11) DEFAULT NULL,
  "url_slug" varchar(50) DEFAULT NULL,
  "manifest_id" int(11) DEFAULT NULL,
  "shipment_date" datetime DEFAULT NULL,
  "tracking_number" varchar(30) DEFAULT NULL,
  "shipping_carrier" varchar(10) DEFAULT NULL,
  "last_tracking_update" datetime DEFAULT NULL,
  "delivery_date" datetime DEFAULT NULL,
  "customer_code" varchar(50) DEFAULT NULL,
  "sub_cust_code" int(11) DEFAULT NULL,
  PRIMARY KEY ("id"),
  UNIQUE KEY "ShipmentID" ("ShipmentId"),
  KEY "manifest_id" ("manifest_id"),
  KEY "order_idx3" ("tracking_number"),
  KEY "order_idx4" ("customer_code"),
  CONSTRAINT "order_ibfk_1" FOREIGN KEY ("manifest_id") REFERENCES "manifest" ("id")
);

Tracking_Update table:

CREATE TABLE "tracking_update" (
  "id" int(11) NOT NULL AUTO_INCREMENT,
  "order_id" int(11) DEFAULT NULL,
  "ship_update_date" datetime DEFAULT NULL,
  "message" varchar(400) DEFAULT NULL,
  "location" varchar(100) DEFAULT NULL,
  "status_type" varchar(2) DEFAULT NULL,
  "last_updated" datetime DEFAULT NULL,
  "hash" varchar(32) DEFAULT NULL,
  PRIMARY KEY ("id"),
  KEY "order_id" ("order_id"),
  KEY "tracking_update_idx2" ("status_type"),
  KEY "tracking_update_idx3" ("ship_update_date"),
  CONSTRAINT "tracking_update_ibfk_1" FOREIGN KEY ("order_id") REFERENCES "order" ("id")
);

Manifest table:

CREATE TABLE "manifest" (
  "id" int(11) NOT NULL AUTO_INCREMENT,
  "upload_date" datetime DEFAULT NULL,
  "name" varchar(100) DEFAULT NULL,
  "destination_gateway" varchar(40) DEFAULT NULL,
  "arrived" tinyint(1) DEFAULT NULL,
  "customer_code" varchar(50) DEFAULT NULL,
  "upload_user" varchar(50) DEFAULT NULL,
  "trip_id" int(11) DEFAULT NULL,
  PRIMARY KEY ("id")
);

Here is also the EXPLAIN on the select statement exported with JSON:

{
    "data":
    [
        {
            "id": 1,
            "select_type": "PRIMARY",
            "table": "m",
            "type": "ALL",
            "possible_keys": "PRIMARY",
            "key": null,
            "key_len": null,
            "ref": null,
            "rows": 220,
            "Extra": "Using where"
        },
        {
            "id": 1,
            "select_type": "PRIMARY",
            "table": "o",
            "type": "ref",
            "possible_keys": "manifest_id",
            "key": "manifest_id",
            "key_len": "5",
            "ref": "fgw247.m.id",
            "rows": 246,
            "Extra": "Using index"
        },
        {
            "id": 1,
            "select_type": "PRIMARY",
            "table": "tu",
            "type": "ref",
            "possible_keys": "order_id",
            "key": "order_id",
            "key_len": "5",
            "ref": "fgw247.o.id",
            "rows": 7,
            "Extra": "Using where"
        },
        {
            "id": 1,
            "select_type": "PRIMARY",
            "table": "<derived2>",
            "type": "ref",
            "possible_keys": "<auto_key0>",
            "key": "<auto_key0>",
            "key_len": "11",
            "ref": "fgw247.o.id,fgw247.tu.last_updated",
            "rows": 13,
            "Extra": "Using index"
        },
        {
            "id": 2,
            "select_type": "DERIVED",
            "table": "tracking_update",
            "type": "index",
            "possible_keys": "order_id",
            "key": "order_id",
            "key_len": "5",
            "ref": null,
            "rows": 1388275,
            "Extra": null
        }
    ]
}

Any help is appreciated

UPDATE

This is the current query that I'm using which is MUCH faster:

SELECT 
    o.*, tu.*
FROM
    fgw247.`order` o
JOIN
    manifest m
ON
    o.`manifest_id` = m.`id` 

JOIN
    `tracking_update` tu
ON
    tu.`order_id` = o.`id` and tu.`ship_update_date` = (select max(last_updated) as last_updated from tracking_update where order_id = o.`id` group by order_id)
WHERE
    m.`upload_date` >= '2015-12-14 11:50:12' 
    OR 
        (o.`delivery_date` IS NULL AND m.`upload_date` < '2015-12-14 11:50:12')
LIMIT 100

回答1:


Consdider the the following options to optimise the execution of your query:

  1. Join types: are you sure that you need the left and right joins? Do you want to report on orders that are not even included in a manifest? If not, then use inner join instead of left and right.

  2. Additional indexes: There is no index on manifest.upload_date field, you should add one. I would also create a composite index on order_id, update_date, and status_type fields (in this order!) on tracking_update table.

  3. Use union instead of the or criterion in where: Mysql is traditionally not really good at optimising or criterion in where clauses. So turn the (m.upload_date >= '2015-12-12 00:00:00') or (m.upload_date <='2015-12-12 00:00:00' and tu.status_type != 'D' and tu.status_type != 'XD') into a union with 2 queries. The 2 queries will be the same as the existing query up to the where part. The 1st query will have only the (m.upload_date >= '2015-12-12 00:00:00') condition in the where clause, while the 2nd one will have the (m.upload_date <='2015-12-12 00:00:00' and tu.status_type != 'D' and tu.status_type != 'XD') criteria.

If you add new indexes, then pls check if the query uses them by running a new explain.



来源:https://stackoverflow.com/questions/34725091/how-to-optimize-the-sql-query-that-joins-3-table

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!