I have a set of tasks, let\'s call it T[]
, where each task T[i]
needs a certain amount of time t(T[i])
to be processed. The tasks are bein
Simulating the test run is the solution when the execution order is (roughly) deterministic. I just took my real processing code, and replaced it with a simple Thread.sleep, while sleeping for the time the task is expected to take to process (just interpreted as milliseconds to scale it down). In the end, I just scaled up the time it took again and the result is quite good. I ran it with nearly 100 tasks with vastly different execution times, on 5 threads. It estimated 1 hour 39 minutes, and the real run was off only by 3 minutes.
long startSim = currentTimeMillis();
List<Integer> taskTimes = parallelTests.getRuntimesForAllTests(); // ordered from longest time
ThreadPoolExecutor simulationExecutor = (ThreadPoolExecutor) Executors.newFixedThreadPool(threadCount);
taskTimes.forEach(taskTime -> simulationExecutor.submit(() -> {
try {
Thread.sleep(taskTime); // this is really seconds, but we just take it as milliseconds
} catch (InterruptedException e) {
e.printStackTrace();
}
}));
simulationExecutor.shutdown();
simulationExecutor.awaitTermination(1, MINUTES);
long stopSim = currentTimeMillis();
long timeNeeded = stopSim - startSim;
// now just multiply it *1000 to scale it up to seconds again, and that's your result
I assume that the tasks are scheduled in the order that they're provided, and that each task goes to the first thread that's free. There's no meaningful non-determinism if these assumptions are correct -- a task may go to any of the threads that are free (if there's more than one), but this has no effect on the total running time.
In that case, we can simulate this using a min-heap of size X (where X is the number of threads), with the values in the heap representing the free time of one of the threads. For each task, we pop the earliest-free thread off the heap, and then push it back with the time it'll finish this new task.
After we've scheduled all tasks, we can take the largest value in the heap, which will be the time at which all tasks are completed.
This is relatively little code in Python:
import heapq
def compute_time(tasks, X):
threads = [0] * X
for t in tasks:
heapq.heappush(threads, heapq.heappop(threads) + t)
return max(threads)
print compute_time([3, 2, 1], 2)
print compute_time([5, 4, 3, 3, 2, 1, 1], 3)
Or in Java:
import java.util.*;
class Threads {
public static void main(String args[]) {
int totalTime1 = computeTotalTime(Arrays.asList(3, 2, 1), 2);
System.out.println("totalTime1: " + totalTime1);
int totalTime2 = computeTotalTime(Arrays.asList(5, 4, 3, 3, 2, 1, 1), 3);
System.out.println("totalTime2: " + totalTime2);
}
static int computeTotalTime(List<Integer> task, int threads) {
PriorityQueue<Integer> q = new PriorityQueue<Integer>();
for (int i = 0; i < threads; i++) q.add(0);
for (int t : task) q.add(q.poll() + t);
int max = 0;
while(!q.isEmpty()) max = q.poll();
return max;
}
}