One for all you MySQL experts :-)
I have the following query:
SELECT o.*, p.name, p.amount, p.quantity
FROM orders o, products p
WHERE o.id = p.order_i
SELECT *:
Selecting all columns with the * wildcard will cause the query's meaning and behavior to change if the table's schema changes, and might cause the query to retrieve too much data.
The != operator is non-standard:
Use the <> operator to test for inequality instead.
Aliasing without the AS keyword: Explicitly using the AS keyword in column or table aliases, such as "tbl AS alias," is more readable than implicit aliases such as "tbl alias".
I'm not MySQL expert (more SQL Server) by I think you'd better have index on o.timestamp and you need to rewrite your query like this
o.timestamp >= '2012-01-01' and o.timestamp <= '2012-01-31' + INTERVAL 1 DAY
The logic is - index will not work if you compare some expression on column and constants. You need to compare column and constants
First, I would use a different style of syntax. ANSI-92
has had 20 years to bed in, and many RDBMS actually recommend not using the notation you have used. It's not going to make a difference in this case, but it really is very good practice for a host of reasons (that I'll let you investigate and make a decision on yourself).
Final answer, and example syntax:
SELECT
o.*, p.name, p.amount, p.quantity
FROM
orders
INNER JOIN
products
ON orders.id = products.order_id
WHERE
orders.timestamp >= '2012-01-01'
AND orders.timestamp < '2012-02-01'
AND orders.total != '0.00'
ORDER BY
orders.timestamp ASC
As the orders
table is the one you are making the initial filtering on, that's a very good place to start looking at optimisation.
With DATE(o.timestamp) BETWEEN x AND y
you succeed in getting all dates and time in January. But that requires calling the DATE()
function on every single row in the orders
table (similar to what RBAR means). The RDBMS can't see through the function to just know how to avoid wasting time. Instead we need to do that optimisation, by re-arranging the maths to not need the function on the field we are filtering.
orders.timestamp >= '2012-01-01'
AND orders.timestamp < '2012-02-01'
This version allows the optimiser to know that you want a block of dates that are all sequential with each other. It's called a range-seek. It can use an index to very quickly find the first record and last record that fit that range, then pick out every record in between. That avoids checking all the records that don't fit, and even avoids checking all the records in the middle of the range; only the boundaries need to be sought out.
That assumes all the records are ordered by date, and that the optimiser can see that. To do so you need an index. With that in mind there seem to be two basic covering indexes that you could use:
- (id, timestamp)
- (timestamp, id)
The first is what I see people use the most. But that forces the optimiser to do the timestamp
range-seek for each id
separately. And since every id
likely has a different timestamp
value, you've gained nothing.
The second index is what I recommend.
Now, the optimiser can fullfill this part of your query, exceptionally quickly...
SELECT
o.*
FROM
orders
WHERE
orders.timestamp >= '2012-01-01'
AND orders.timestamp < '2012-02-01'
ORDER BY
orders.timestamp ASC
As it happens, even the ORDER BY
has been optimised with the suggested index. It's already in the order that you want the data to be output. There is no need to re-sort everything after the join.
Then, to fullfill the total != '0.00'
requirement, every row in your range is still checked. But you've already narrowed the range down so much that this will probably be fine. (I wont go in to it, but you will likely find it impossible to use indexes in MySQL to optimise this and the timestamp
range-seek.)
Then, you have your join. That's optimised by an index you already have (products.order_id)
. For every record picked out by the snippet above, the optimiser can do an index seek and very quickly identify the matching record(s).
This all assumes that, in the vast majority of cases, every order row has one or more product rows. If, for example, only a very select few orders had any product rows, it may be faster to pick out the product rows of interest first; essentially looking at the joins happening in reverse order.
The optimiser actually makes that decision for you, but it's handy to know that it's doing that, then provide the indexes you estimate will be most useful to it.
You can check the explain plan to see if the indexes are being used. If not, your attempt to help was ignored. Probably because of the statistics of the data implying a different order of joining was better. If so you can then provide indexes to help that order of joins instead.
Use Explain to indicate how to optimise the query. I'd suggest starting with indices on Total and TimeStamp
You may find removing the date
function improves performance.
You should use modern syntax.
eg.
SELECT o.*, p.name, p.amount, p.quantity
FROM orders o
inner join products p
on o.id = p.order_id
WHERE o.total != '0.00'
AND o.timestamp BETWEEN '2012-01-01' AND '2012-01-31 23:59'
ORDER BY o.timestamp ASC