I am writing code for the external merge sort. The idea is that the input files contain too many numbers to be stored in an array so you read some of it and put it into files to
We have implemented a public domain external sort in Java:
http://code.google.com/p/externalsortinginjava/
It might be faster than yours. We use strings and not integers, but you could easily modify our code by substituting integers for strings (the code was made hackable by design). At the very least, you can compare with our design.
Looking at your code, it seems like you are reading the data in units of integers. So IO will be a bottleneck I would guess. With external memory algorithms, you want to read and write blocks of data---especially in Java.
I would use memory mapped files. It can be as much as 10x faster than using this type of IO. I suspect it will be much faster in this case as well. The mapped buffers use virtual memory rather heap space to store data and can be larger than your available physical memory.
You are sorting integers so you should check out radix sort. The core idea of radix sort is that you can sort n byte integers with n passes through the data with radix 256.
You can combine this with merge sort theory.
You might wish to merge k>2 segments at a time. This reduces the amount of I/O from n log k / log 2 to n log n / log k.
Edit: In pseudocode, this would look something like this:
void sort(List list) {
if (list fits in memory) {
list.sort();
} else {
sublists = partition list into k about equally big sublists
for (sublist : sublists) {
sort(sublist);
}
merge(sublists);
}
}
void merge(List[] sortedsublists) {
keep a pointer in each sublist, which initially points to its first element
do {
find the pointer pointing at the smallest element
add the element it points to to the result list
advance that pointer
} until all pointers have reached the end of their sublist
return the result list
}
To efficiently find the "smallest" pointer, you might employ a PriorityQueue
.