I\'ve got a real little interesting (at least to me) problem to solve (and, no, it is not homework). It is equivalent to this: you need to determine \"sessions\" and \"sessions
Maximum Delay
If the log entries have a "maximum delay" (e.g. with a maximum delay of 2 hours, an 8:12 event will never be listed after a 10:12 event), you could look ahead and sort.
Do Sort
Alternatively, I'd first try sorting - at least to make sure it doesnt work. A timestamp can be reasonably stored in 8 bytes (4 even for your purposes, you could put 250 Millions of then into a gigabyte). Quicksort might not be the best choice here as it has low locality, insertion sort is almost-perfect for almost-sorted data (though it has bad locality, too), alternatively, quick-sorting chunk-wise, then merging chunks with a merge sort should do, even though it increases memory requirements.
Squash and conquer
Alternatively, you can use the following strategy:
If your log files have the kind of "temporal locality" your question suggests, already a single pass should reduce the data to allow a "full" sort.
[edit] [This site]1 demonstrates an "optimized quicksort with insertion sort finish" that's quite good on almost-sorted data. As has this guys std::sort