One 400GB table, One query - Need Tuning Ideas (SQL2005)

后端未结

关注

 24  2122

予麋鹿 2021-01-30 15:03

I have a single large table which I would like to optimize. I\'m using MS-SQL 2005 server. I\'ll try to describe how it is used and if anyone has any suggestions I would appreci

24条回答

梦谈多话 (楼主)

2021-01-30 15:44

As I hinted in a comment, I have done this with a single Oracle table approaching 8 TB consisting of over two billion rows growing at the rate of forty million rows per day. However, in my case, the users were two million (and growing) customers accessing this data over the web, 24x7, and literally ANY of the rows was subject to being accessed. Oh, and new rows had to be added within two minutes of real-time.

You are probably I/O bound, not CPU or memory bound, so optimizing the disk access is critical. Your RAM is fine--more than adequate. Using multiple cores would be helpful, but limited if the I/O is not parallelized.

Several people have suggested splitting up the data, which should be taken seriously since it is far better and more effective than any other solution (nothing is faster than not touching the data at all).

You say you can't split the data because all the data is used: IMPOSSIBLE! There is no way that your users are paging through one million rows per day or one hundred million rows total. So, get to know how your users are ACTUALLY using the data--look at every query in this case.

More importantly, we are not saying that you should DELETE the data, we are saying to SPLIT the data. Clone the table structure into multiple, similarly-named tables, probably based on time (one month per table, perhaps). Copy the data into the relevant tables and delete the original table. Create a view that performs a union over the new tables, with the same name as the original table. Change your insert processing to target the newest table (assuming that it is appropriate), and your queries should still work against the new view.

Your savvy users can now start to issue their queries against a subset of the tables, perhaps even the newest one only. Your unsavvy users can continue to use the view over all the tables.

You now have a data management strategy in the form of archiving the oldest table and deleting it (update the view definition, of course). Likewise, you will need to create a new table periodically and update the view definition for that end of the data as well.

Expect to not be able to use unique indexes: they don't scale beyond about one-to-two million rows. You may also have to modify some other tactics/advice as well. At one hundred million rows and 400 GB, you have entered another realm of processing.

Beyond that, use the other suggestions--analyze the actual performance using the many tools already available in SQL Server and the OS. Apply the many well-known tuning techniques that are readily available on the web or in books.

However, do NOT experiment! With that much data, you don't have time for experiments and the risk is too great. Study carefully the available techniques and your actual performance details, then choose one step at a time and give each one a few hours to days to reveal its impact.

0 讨论(0)

查看其它24个回答
发布评论:

提交评论
- 加载中...