I'm new to programming and I've recently come across an issue with finding the intersection of n vectors, (int vectors) that have sorted ints. The approach that I came up with has a complexity of O(n^2) and I am using the std::set_intersect function.
The approach that I came up with is by having two vectors: the first vector would correspond to the first vector that I have, and the second would be the second vector. I call set intersection on the two and overwrite to the first vector, then use the vector clear function on the second. I then overwrite the next vector to the second, and repeat the process, and eventually returning the first vector.
I do believe there is a more efficient way of going about this, but at the moment, I can not think of a more efficient manner. Any help on this issue would be much appreciated.
Fortunately, I think a much tighter bound can be placed on the complexity of your algorithm.
The complexity of std::set_intersection
on input sets of size n1 and n2 is
O(n1 + n2).
You could take your original vectors and intersect them in single-elimination
tournament style, that is, on the first round you intersect the 1st and 2nd
vectors, the 3rd and 4th, the 5th and 6th, and so forth; on the
second round you intersect the 1st and 2nd intersections, the 3rd and 4th,
and so forth; repeat until the final round produces just one intersection.
The sum of the sizes of all the vectors surviving each round is no more than
half the sum of the sizes of the vectors at the start of the round,
so this algorithm takes O(N) time (also O(N) space) altogether
where N is the sum of the sizes of all the original vectors in your input.
(It's O(N) because N + N/2 + N/4 + ... < 2N.)
So, given an input consisting of already-sorted vectors, the complexity of the algorithm is O(N).
Your algorithm merges the vectors in a very different sequence, but while I'm not 100% sure it is also O(N), I strongly suspect that it is.
Edit: Concerning how to actually implement the "tournament" algorithm in C++, it depends on how hard you want to work to optimize this, and somewhat on the nature of your input.
The easiest approach would be to make a new list of vectors; take two vectors from the old list, push a vector onto the new list, merge the two old vectors onto the new vector, destroy the old vectors, hope the library manages the memory efficiently.
If you want to reduce the allocation of new vectors, then re-using vectors
(as you already thought to do) might help. If the input data structure is
an std::list<std::vector<int> >
, for example, you could start by pushing one empty vector onto the front of this list. Make three iterators, one to the new vector, and one to each of the original first two vectors in the list.
Take the intersection of the vectors at the last two iterators,
writing the result to the first iterator, then clear the vectors at the
last two iterators. Move the last two iterators forward two places each,
move the first iterator forward one place. Repeat. If you reach a state where
one of the last two iterators has reached end() but the other has not,
erase all the list elements between the first iterator and the other iterator.
Now you have a list of vectors again and can repeat as long as there is
more than one vector in the list.
If the input is std::vector<std::vector<int> >
then pushing an element
onto the front of the list is relatively expensive, so you might want a
slightly more complicated algorithm. There are lots of choices, no really
obvious winners I can think of.
Here is another analysis that shows that your algorithm is already linear.
Suppose you have some collection of vectors and the algorithm repeatedly selects some two vectors from the collection and replaces them with their intersection, until there is one vector left. Your method fits this description. I argue that any such algorithm will spend, in total, linear time in all executions of set_intersection
.
Suppose set_intersection
takes at most A * (x + y)
operations to for vectors of size x
and y
.
Let K
be sum of lengths of all vectors in collection. It starts as size of the input (n
) and it cannot fall below zero, so it can change by at most n
.
Every time the vectors of sizes (x
, y
) are combined value of K
is decreased by at least (x + y)/2
, as result has to be shorter than either input. If we sum this over all calls we get that sum { (x + y)/2 } <= n
, as K
cannot change by more than n
.
From this we can derive that sum { A * (x + y) } <= 2 * A * n = O(n)
. Left side here is total time spent in set_intersection
.
In less formal language - to spend x + y
time in set_intersection
you need to remove at least (x + y)/2
elements from your collection, so spending more than linear time executing set_intersection
would make you run out of elements.
来源:https://stackoverflow.com/questions/29319653/intersection-of-n-vectors