Multithreaded execution where order of finished Work Items is preserved

試著忘記壹切 提交于 2020-01-11 16:45:09

问题


I have a flow of units of work, lets call them "Work Items" that are processed sequentially (for now). I'd like to speed up processing by doing the work multithreaded.

Constraint: Those work items come in a specific order, during processing the order is not relevant - but once processing is finished the order must be restored.

Something like this:

   |.|
   |.|
   |4|
   |3|
   |2|    <- incoming queue
   |1|
  / | \
 2  1  3  <- worker threads
  \ | /
   |3|
   |2|    <- outgoing queue
   |1|

I would like to solve this problem in Java, preferably without Executor Services, Futures, etc., but with basic concurrency methods like wait(), notify(), etc.

Reason is: My Work Items are very small and fine grained, they finish processing in about 0.2 milliseconds each. So I fear using stuff from java.util.concurrent.* might introduce way to much overhead and slow my code down.

The examples I found so far all preserve the order during processing (which is irrelevant in my case) and didn't care about order after processing (which is crucial in my case).


回答1:


This is how I solved your problem in a previous project (but with java.util.concurrent):

(1) WorkItem class does the actual work/processing:

public class WorkItem implements Callable<WorkItem> {
    Object content;
    public WorkItem(Object content) {
        super();
        this.content = content;
    }

    public WorkItem call() throws Exception {
        // getContent() + do your processing
        return this;
    }
}

(2) This class puts Work Items in a queue and initiates processing:

public class Producer {
    ...
    public Producer() {
        super();
        workerQueue = new ArrayBlockingQueue<Future<WorkItem>>(THREADS_TO_USE);
        completionService = new ExecutorCompletionService<WorkItem>(Executors.newFixedThreadPool(THREADS_TO_USE));
        workerThread = new Thread(new Worker(workerQueue));
        workerThread.start();
    }

    public void send(Object o) throws Exception {
        WorkItem workItem = new WorkItem(o);
        Future<WorkItem> future = completionService.submit(workItem);
        workerQueue.put(future);
    }
}

(3) Once processing is finished the Work Items are dequeued here:

public class Worker implements Runnable {
    private ArrayBlockingQueue<Future<WorkItem>> workerQueue = null;

    public Worker(ArrayBlockingQueue<Future<WorkItem>> workerQueue) {
        super();
        this.workerQueue = workerQueue;
    }

    public void run() {
        while (true) {
            Future<WorkItem> fwi = workerQueue.take(); // deqeueue it
            fwi.get(); // wait for it till it has finished processing
        }
    }
}

(4) This is how you would use the stuff in your code and submit new work:

public class MainApp {
    public static void main(String[] args) throws Exception {
        Producer p = new Producer();
        for (int i = 0; i < 10000; i++)
            p.send(i);
    }
}



回答2:


If you allow BlockingQueue, why would you ignore the rest of the concurrency utils in java? You could use e.g. Stream (if you have java 1.8) for the above:

List<Type> data = ...;
List<Other> out = data.parallelStream()
    .map(t -> doSomeWork(t))
    .collect(Collectors.toList());

Because you started from an ordered Collection (List), and collect also to a List, you will have results in the same order as the input.




回答3:


Just ID each of the objects for processing, create a proxy which would accept done work and allow to return it only when the ID pushed was sequential. A sample code below. Note how simple it is, utilizing an unsynchronized auto-sorting collection and just 2 simple methods as API.

public class SequentialPushingProxy {

    static class OrderedJob implements Comparable<OrderedJob>{
        static AtomicInteger idSource = new AtomicInteger();
        int id;

        public OrderedJob() {
            id = idSource.incrementAndGet();
        }

        public int getId() {
            return id;
        }

        @Override
        public int compareTo(OrderedJob o) {
            return Integer.compare(id, o.getId());
        }
    }

    int lastId = OrderedJob.idSource.get();

    public Queue<OrderedJob> queue;

    public SequentialPushingProxy() {
        queue = new PriorityQueue<OrderedJob>();
    }

    public synchronized void pushResult(OrderedJob job) {
        queue.add(job);
    }

    List<OrderedJob> jobsToReturn = new ArrayList<OrderedJob>();
    public synchronized List<OrderedJob> getFinishedJobs() {
        while (queue.peek() != null) {
            // only one consumer at a time, will be safe
            if (queue.peek().getId() == lastId+1) {
                jobsToReturn.add(queue.poll());
                lastId++;
            } else {
                break;
            }
        }
        if (jobsToReturn.size() != 0) {
            List<OrderedJob> toRet = jobsToReturn;
            jobsToReturn = new ArrayList<OrderedJob>();
            return toRet;
        }
        return Collections.emptyList();
    }

    public static void main(String[] args) {
        final SequentialPushingProxy proxy = new SequentialPushingProxy();

        int numProducerThreads = 5;

        for (int i=0; i<numProducerThreads; i++) {
            new Thread(new Runnable() {
                @Override
                public void run() {
                    while(true) {
                        proxy.pushResult(new OrderedJob());
                    }
                }
            }).start();
        }


        int numConsumerThreads = 1;

        for (int i=0; i<numConsumerThreads; i++) {
            new Thread(new Runnable() {
                @Override
                public void run() {
                    while(true) {
                        List<OrderedJob> ret = proxy.getFinishedJobs();
                        System.out.println("got "+ret.size()+" finished jobs");
                        try {
                            Thread.sleep(200);
                        } catch (InterruptedException e) {
                            // TODO Auto-generated catch block
                            e.printStackTrace();
                        }
                    }
                }
            }).start();
        }


        try {
            Thread.sleep(5000);
        } catch (InterruptedException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

        System.exit(0);
    }

}

This code could be easily improved to

  • allow pushing more than one job result at once, to reduce the synchronization costs
  • introduce a limit to returned collection to get done jobs in smaller chunks
  • extract an interface for those 2 public methods and switch implementations to perform tests



回答4:


You could have 3 input and 3 output queues - one of each type for each worker thread.

Now when you want to insert something into the input queue you put it into only one of the 3 input queues. You change the input queues in a round robin fashion. The same applies to the output, when you want to take something from the output you choose the first of the output queues and once you get your element you switch to the next queue.

All the queues need to be blocking.




回答5:


Pump all your Futures through a BlockingQueue. Here's all the code you need:

public class SequentialProcessor implements Consumer<Task> {
    private final ExecutorService executor = Executors.newCachedThreadPool();
    private final BlockingDeque<Future<Result>> queue = new LinkedBlockingDeque<>();

    public SequentialProcessor(Consumer<Result> listener) {
        new Thread(() -> {
            while (true) {
                try {
                    listener.accept(queue.take().get());
                } catch (InterruptedException | ExecutionException e) {
                    // handle the exception however you want, perhaps just logging it
                }
            }
        }).start();
    }

    public void accept(Task task) {
        queue.add(executor.submit(callableFromTask(task)));
    }

    private Callable<Result> callableFromTask(Task task) {
        return <how to create a Result from a Task>; // implement this however
    }
}

Then to use, create a SequentialProcessor (once):

SequentialProcessor processor = new SequentialProcessor(whatToDoWithResults);

and pump tasks to it:

Stream<Task> tasks; // given this

tasks.forEach(processor); // simply this

I created the callableFromTask() method for illustration, but you can dispense with it if getting a Result from a Task is simple by using a lambda instead or method reference instead.

For example, if Task had a getResult() method, do this:

queue.add(executor.submit(task::getResult));

or if you need an expression (lambda):

queue.add(executor.submit(() -> task.getValue() + "foo")); // or whatever



回答6:


I think that you need an extra queue to hold the incoming order. IncomingOrderQueue.

When you consume the objects you put them in some storage, for example Map and then from another thread which consumes from the IncomingOrderQueue you pick the ids(hashes) of the objects and then you collect them from this HashMap.

This solution can easily be implemented without execution service.




回答7:


Reactive programming could help. During my brief experience with RxJava I found it to be intuitive and easy to work with than core language features like Future etc. Your mileage may vary. Here are some helpful starting points https://www.youtube.com/watch?v=_t06LRX0DV0

The attached example also shows how this could be done. In the example below we have Packet's which need to be processed. They are taken through a simple trasnformation and fnally merged into one list. The output appended to this message shows that the Packets are received and transformed at different points in time but in the end they are output in the order they have been received

import static java.time.Instant.now;
import static rx.schedulers.Schedulers.io;

import java.time.Instant;
import java.util.List;
import java.util.Random;

import rx.Observable;
import rx.Subscriber;

public class RxApp {

  public static void main(String... args) throws InterruptedException {

    List<ProcessedPacket> processedPackets = Observable.range(0, 10) //
        .flatMap(i -> {
          return getPacket(i).subscribeOn(io());
        }) //
        .map(Packet::transform) //
        .toSortedList() //
        .toBlocking() //
        .single();

    System.out.println("===== RESULTS =====");
    processedPackets.stream().forEach(System.out::println);
  }

  static Observable<Packet> getPacket(Integer i) {
    return Observable.create((Subscriber<? super Packet> s) -> {
      // simulate latency
      try {
        Thread.sleep(new Random().nextInt(5000));
      } catch (Exception e) {
        e.printStackTrace();
      }
      System.out.println("packet requested for " + i);
      s.onNext(new Packet(i.toString(), now()));
      s.onCompleted();
    });
  }

}


class Packet {
  String aString;
  Instant createdOn;

  public Packet(String aString, Instant time) {
    this.aString = aString;
    this.createdOn = time;
  }

  public ProcessedPacket transform() {
    System.out.println("                          Packet being transformed " + aString);
    try {
      Thread.sleep(new Random().nextInt(5000));
    } catch (Exception e) {
      e.printStackTrace();
    }
    ProcessedPacket newPacket = new ProcessedPacket(this, now());
    return newPacket;
  }

  @Override
  public String toString() {
    return "Packet [aString=" + aString + ", createdOn=" + createdOn + "]";
  }
}


class ProcessedPacket implements Comparable<ProcessedPacket> {
  Packet p;
  Instant processedOn;

  public ProcessedPacket(Packet p, Instant now) {
    this.p = p;
    this.processedOn = now;
  }

  @Override
  public int compareTo(ProcessedPacket o) {
    return p.createdOn.compareTo(o.p.createdOn);
  }

  @Override
  public String toString() {
    return "ProcessedPacket [p=" + p + ", processedOn=" + processedOn + "]";
  }

}

Deconstruction

Observable.range(0, 10) //
    .flatMap(i -> {
      return getPacket(i).subscribeOn(io());
    }) // source the input as observables on multiple threads


    .map(Packet::transform) // processing the input data 

    .toSortedList() // sorting to sequence the processed inputs; 
    .toBlocking() //
    .single();

On one particular run Packets were received in the order 2,6,0,1,8,7,5,9,4,3 and processed in order 2,6,0,1,3,4,5,7,8,9 on different threads

packet requested for 2
                          Packet being transformed 2
packet requested for 6
                          Packet being transformed 6
packet requested for 0
packet requested for 1
                          Packet being transformed 0
packet requested for 8
packet requested for 7
packet requested for 5
packet requested for 9
                          Packet being transformed 1
packet requested for 4
packet requested for 3
                          Packet being transformed 3
                          Packet being transformed 4
                          Packet being transformed 5
                          Packet being transformed 7
                          Packet being transformed 8
                          Packet being transformed 9
===== RESULTS =====
ProcessedPacket [p=Packet [aString=2, createdOn=2016-04-14T13:48:52.060Z], processedOn=2016-04-14T13:48:53.247Z]
ProcessedPacket [p=Packet [aString=6, createdOn=2016-04-14T13:48:52.130Z], processedOn=2016-04-14T13:48:54.208Z]
ProcessedPacket [p=Packet [aString=0, createdOn=2016-04-14T13:48:53.989Z], processedOn=2016-04-14T13:48:55.786Z]
ProcessedPacket [p=Packet [aString=1, createdOn=2016-04-14T13:48:54.109Z], processedOn=2016-04-14T13:48:57.877Z]
ProcessedPacket [p=Packet [aString=8, createdOn=2016-04-14T13:48:54.418Z], processedOn=2016-04-14T13:49:14.108Z]
ProcessedPacket [p=Packet [aString=7, createdOn=2016-04-14T13:48:54.600Z], processedOn=2016-04-14T13:49:11.338Z]
ProcessedPacket [p=Packet [aString=5, createdOn=2016-04-14T13:48:54.705Z], processedOn=2016-04-14T13:49:06.711Z]
ProcessedPacket [p=Packet [aString=9, createdOn=2016-04-14T13:48:55.227Z], processedOn=2016-04-14T13:49:16.927Z]
ProcessedPacket [p=Packet [aString=4, createdOn=2016-04-14T13:48:56.381Z], processedOn=2016-04-14T13:49:02.161Z]
ProcessedPacket [p=Packet [aString=3, createdOn=2016-04-14T13:48:56.566Z], processedOn=2016-04-14T13:49:00.557Z]



回答8:


You could launch a DoTask thread for every WorkItem. This thread processes the work. When the work is done, you try to post the item, synchronized on the controlling object, in which you check if it's the right ID and wait if not.

The post implementation can be something like:

synchronized(controllingObject) {
try {
while(workItem.id != nextId) controllingObject.wait();
} catch (Exception e) {}
//Post the workItem
nextId++;
object.notifyAll();
}



回答9:


Preprocess: add an order value to each item, prepare an array if it is not allocated.

Input: queue (concurrent sampling with order values 1,2,3,4 but doesnt matter which tread gets which sample)

Output: array (writing to indexed elements, using a synch point to wait for all threads in the end, doesn't need collision checks since it writes different positions for every thread)

Postprocess: convert array to a queue.

Needs n element-array for n-threads. Or some multiple of n to do postprocessing only once.



来源:https://stackoverflow.com/questions/36433652/multithreaded-execution-where-order-of-finished-work-items-is-preserved

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!