If your algorithm is CPU-heavy, you may want to consider taking advantage of parallelisation. You may be able to sort in multiple threads and merge the results back later.
This is however not a decision to be taken lightly, as writing concurrent code is hard.