问题
I have got this problem:
- Find the first element in a list, for which a given condition holds.
Unfortunately, the list is quite long (100.000 elements), and evaluation the condition for each element takes in total about 30 seconds using one single Thread.
Is there a way to cleanly parallelize this problem? I have looked through all the tbb patterns, but could not find any fitting.
UPDATE: for performance reason, I want to stop as early as possible when an item is found and stop processing the rest of the list. That's why I believe I cannot use parallel_while
or parallel_do
.
回答1:
I'm not too familiar with libraries for this, but just thinking aloud, could you not have a group of threads iterating at different at the same stride from different staring points?
Say you decide to have n
threads (= number of cores or whatever), each thread should be given a specific starting point up to n
, so the first thread starts on begin()
, the next item it compares is begin() + n
, etc. etc. second thread starts on begin()+1
and then it's next comparison is in n
too etc.
This way you can have a group of threads iterating in parallel through the list, the iteration itself is presumably not expensive - just the comparison. No node will be compared more than once and you can have some condition which is set when a match is made by any of the threads and all should check this condition before iterating/comparing..
I think it's pretty straightforward to implement(?)
回答2:
I think the best way to solve this problem with TBB is parallel_pipeline
.
There should be (at least) two stages in the pipeline. The 1st stage is serial; it just reads the next element from the list and passes it to the 2nd stage. This 2nd stage is parallel; it evaluates the condition of interest for a given element. As soon as the condition is met, the second stage sets a flag (which should be either atomic or protected with a lock) to indicate that a solution is found. The first stage must check this flag and stop reading the list once the solution is found.
Since condition evaluation is performed in parallel for a few elements, it can happen that a found element is not the first suitable one in the list. If this is important, you also need to keep an index of the element, and when a suitable solution is found you detect whether its index is less than that of a previously known solution (if any).
HTH.
回答3:
ok, I have done it this way:
- Put all elements into a
tbb::concurrent_bounded_queue<Element> elements
. - Create an empty
tbb::concurrent_vector<Element> results
. - Create a
boost::thread_group
, and create several threads that run this logic:
logic to run in parallel:
Element e;
while (results.empty() && elements.try_pop(e) {
if (slow_and_painfull_check(e)) {
results.push_back(e);
}
}
So when the first element is found, all other threads will stop processing the next time they check results.empty()
.
It is possible that two or more threads are working on an element for which slow_and_painfull_check
returns true, so I just put the result into a vector and deal with this outside of the parallel loop.
After all threads in the thread group have finished, I check all elements in the results
and use the one that comes first.
回答4:
you can take a look at http://gcc.gnu.org/onlinedocs/libstdc++/manual/parallel_mode.html for parallel algorithms implementations. And in particular you need find_if algorithm http://www.cplusplus.com/reference/algorithm/find_if/
回答5:
I see two opportunities for parallelism here: evaluating one element on multiple threads, or evaluating multiple elements at once on different threads.
There isn't enough information to determine the difficulty nor the effectiveness of evaluating one element on multiple threads. If this is easy, the 30 second per element time could be reduced.
I do not see a clean fit into TBB for this problem. There are issues with lists not having random access iterators, determining when to stop, and guaranteeing the first element is found. There may be some games you can play with the ranges to get it to work though.
You could use some lower level thread constructs to implement this yourself as well, but there are a number of places for incorrect results to be returned. To prevent such errors, I would recommend using an existing algorithm. You could convert the list to an array (or some other structure with random access iterators) and use the experimental libstdc++ Parellel Mode find_if algorithm user383522 referenced.
回答6:
If it's a linked list, A parallel search isn't going to add much speed. However, linked lists tend to perform poorly with caches. You may get a tiny performance increase if you have two threads: one does the find_first_element, and one simply iterates through the list, making sure not to get more than X (100?) ahead of the first thread. The second thread doesn't do any comparisons, but will assure that the items are cached as well as possible for the first thread. This may help your time, or it might make little difference, or it might hinder. Test everything.
回答7:
Can't you transform the list to a balanced tree or similar? Such data structures are easier to process in parallel - usually you get back the overhead you may have paid in making it balanced in the first time... For example, if you write functional-style code, check this paper: Balanced trees inhabiting functional parallel programming
回答8:
If you are using GCC, GNU OpenMP provides parallel std functions link
回答9:
I've never heard of the Intel tbb library but a quick open and scan of the Tutorial led me to parallel_for
which seems like it will do the trick.
来源:https://stackoverflow.com/questions/7726864/tbb-parallel-find-first-element