Join table on itself - performance

谁说胖子不能爱 提交于 2019-12-05 06:47:36

问题


I would like some help with the following join. I have one table (with about 20 million rows) that consists of:

MemberId (Primary Key) | Id (Primary Key) | TransactionDate | Balance

I would like to get the latest Balance for all the customers in one query. I know I could do something like this (I just wrote it from my memory). But this way is terribly slow.

SELECT * 
FROM money 
WHERE money.Id = (SELECT MAX(Id) 
                  FROM money AS m 
                  WHERE m.MemberId = money.MemberId)

Are there any other (faster/smarter) options?


回答1:


In all optimization tutorials and screencasts that I've endured through, joins are always favoured over subqueries. When using a sub-query the sub-query is executed for each comparison, where as with a join only once.

SELECT * 
FROM money m
INNER JOIN (
    SELECT memberId, MAX(id) AS maxid
    FROM money
    GROUP BY memberId
) mmax ON mmax.maxid = m.id AND mmax.memberId = m.memberId



回答2:


JOINing is not the best way to go about this. Consider using a GROUP BY clause to sift out the last transaction for each member, like this:

SELECT MemberId, MAX(Id), TransactionDate, Balance FROM money GROUP BY MemberId

UPDATE

as PKK pointed out, balance will be chosen randomly. It looks like you'll have to perform some sort of join after all. Consider this option:

SELECT MemberId, Id, TransactionDate, Balance FROM money WHERE Id IN (
    SELECT MAX(Id) FROM money GROUP BY MemberId
)



回答3:


Other option is to lookup for NULL values in a left join:

SELECT m1.*
  FROM money m1
  LEFT JOIN money m2 ON m2.memberId = m1.memberId AND m2.id > m1.id
 WHERE m2.memberId IS NULL

But of course Umbrella's answer is better.



来源:https://stackoverflow.com/questions/8713476/join-table-on-itself-performance

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!