问题
Suppose I am developing an application with a single database server and multiple application servers, where it is cheap and easy to add application servers but difficult to scale the database. Suppose I want to retrieve some information from the database that needs to be sorted. All else being equal, it seems like I should prefer to sort on the application servers, since that shifts the load away from the database, which is hard to scale.
Now there are certainly some cases in which sorting on the database server is a no-brainer:
- Sorting is necessary in order to obtain the correct result set. For example, if I want the top N according to some criterion, I obviously have to sort before I even know which rows I want. Sorting on the application server isn't an option here (unless I'm willing to suck down the entire table, which is typically not what I want to do).
- There is an index that supports my sort order. In this case sorting on the database server is essentially free.
But other than that, am I generally correct to prefer sorting on the application server? Are there some cases I should consider in addition to those listed above?
回答1:
My instinct is to sort the data on the database server, as that is one of it's prime functions and it is likely extremely efficient at it. The danger however is that the data may get resorted anyway at the client level, thus wasting processes.
If you have a database server that is so stressed it can no longer sort data quickly, you have bigger problems.
If the majority of the queries running on a server have been optimized, if the schema is rational, and the indexes are in place, a database server can do an enormous amount of work without even breaking a sweat.
回答2:
I'll supplement Jaimal's comment with my own experience using the PostgreSQL DBMS. If you have a large shared buffer pool and you can prepare your statements which you are concerned about sort performance, you get a high-performance cache "for free" from your DBMS. If your queries cannot be prepared but you can limit the attributes you need in the result set, you can make an index on those attributes with your sort predicate. If you cannot perform any of these optimizations on the back end, then sorting in the application server will work well.
Regarding performance differences between sorting in an application and in the DBMS, I would expect the application language to have some overhead depending on it's object model. For example, I would expect sorting 1000000 Ruby objects versus 1000000 PostgreSQL tuples would show that the database is faster.
回答3:
I believe that you are right. In the absence of an index the database has no performance advantage over sorting on your application server. In fact, on your app server you have control over which sorting algorithm you use, so in principle you could use something like radix sort (O(n) time) rather than quicksort if it applies to your case.
回答4:
If your data doesn't change to often (you're willing to cache data) and you have a limited number of possible result sets, then you could sort on the database, but cache the result set or cache an array of keys for the result set saving having to do always perform the same sort of the same data.
来源:https://stackoverflow.com/questions/5870929/sorting-on-database-server-or-application-server-in-n-tier-architecture