intersection of n vectors

旧时模样 提交于 2019-12-02 03:33:16

问题


I'm new to programming and I've recently come across an issue with finding the intersection of n vectors, (int vectors) that have sorted ints. The approach that I came up with has a complexity of O(n^2) and I am using the std::set_intersect function.

The approach that I came up with is by having two vectors: the first vector would correspond to the first vector that I have, and the second would be the second vector. I call set intersection on the two and overwrite to the first vector, then use the vector clear function on the second. I then overwrite the next vector to the second, and repeat the process, and eventually returning the first vector.

I do believe there is a more efficient way of going about this, but at the moment, I can not think of a more efficient manner. Any help on this issue would be much appreciated.


回答1:


Fortunately, I think a much tighter bound can be placed on the complexity of your algorithm.

The complexity of std::set_intersection on input sets of size n1 and n2 is O(n1 + n2). You could take your original vectors and intersect them in single-elimination tournament style, that is, on the first round you intersect the 1st and 2nd vectors, the 3rd and 4th, the 5th and 6th, and so forth; on the second round you intersect the 1st and 2nd intersections, the 3rd and 4th, and so forth; repeat until the final round produces just one intersection. The sum of the sizes of all the vectors surviving each round is no more than half the sum of the sizes of the vectors at the start of the round, so this algorithm takes O(N) time (also O(N) space) altogether where N is the sum of the sizes of all the original vectors in your input. (It's O(N) because N + N/2 + N/4 + ... < 2N.)

So, given an input consisting of already-sorted vectors, the complexity of the algorithm is O(N).

Your algorithm merges the vectors in a very different sequence, but while I'm not 100% sure it is also O(N), I strongly suspect that it is.


Edit: Concerning how to actually implement the "tournament" algorithm in C++, it depends on how hard you want to work to optimize this, and somewhat on the nature of your input.

The easiest approach would be to make a new list of vectors; take two vectors from the old list, push a vector onto the new list, merge the two old vectors onto the new vector, destroy the old vectors, hope the library manages the memory efficiently.

If you want to reduce the allocation of new vectors, then re-using vectors (as you already thought to do) might help. If the input data structure is an std::list<std::vector<int> >, for example, you could start by pushing one empty vector onto the front of this list. Make three iterators, one to the new vector, and one to each of the original first two vectors in the list. Take the intersection of the vectors at the last two iterators, writing the result to the first iterator, then clear the vectors at the last two iterators. Move the last two iterators forward two places each, move the first iterator forward one place. Repeat. If you reach a state where one of the last two iterators has reached end() but the other has not, erase all the list elements between the first iterator and the other iterator. Now you have a list of vectors again and can repeat as long as there is more than one vector in the list.

If the input is std::vector<std::vector<int> > then pushing an element onto the front of the list is relatively expensive, so you might want a slightly more complicated algorithm. There are lots of choices, no really obvious winners I can think of.




回答2:


Here is another analysis that shows that your algorithm is already linear.

Suppose you have some collection of vectors and the algorithm repeatedly selects some two vectors from the collection and replaces them with their intersection, until there is one vector left. Your method fits this description. I argue that any such algorithm will spend, in total, linear time in all executions of set_intersection.

Suppose set_intersection takes at most A * (x + y) operations to for vectors of size x and y.

Let K be sum of lengths of all vectors in collection. It starts as size of the input (n) and it cannot fall below zero, so it can change by at most n.

Every time the vectors of sizes (x, y) are combined value of K is decreased by at least (x + y)/2, as result has to be shorter than either input. If we sum this over all calls we get that sum { (x + y)/2 } <= n, as K cannot change by more than n.

From this we can derive that sum { A * (x + y) } <= 2 * A * n = O(n). Left side here is total time spent in set_intersection.

In less formal language - to spend x + y time in set_intersection you need to remove at least (x + y)/2 elements from your collection, so spending more than linear time executing set_intersection would make you run out of elements.



来源:https://stackoverflow.com/questions/29319653/intersection-of-n-vectors

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!