Union of All Intersecting Sets

前端未结

关注

 5  506

花落未央 2021-01-20 08:08

Given a list of objects with multiple attributes I need to find the list of sets created by a union of all intersecting subsets.

Specifically these are Person object

5条回答

生来不讨喜 (楼主)

2021-01-20 08:57
So your collection example could look like this:
```
A { ss |-> 42, dl |-> 123 }
B { ss |-> 42, dl |-> 456 }
C { ss |-> 23, dl |-> 456 }
D { ss |-> 89, dl |-> 789 }
E { ss |-> 89, dl |-> 432 }
```
Then I would suggest to use an algorithm where you build up multi-collections by incrementally merging or inserting each collection into the multi-collections:

Iteration 1. The first collection becomes the only multi-collection:
```
{A} { ss |-> [42], dl |-> [123] }
```
Iteration 2. Merge the next collection into the first since SSN is already present:
```
{A,B} { ss |-> [42], dl |-> [123,456] }
```
Iteration 3. Merge again, since the DLN is already there:
```
{A,B,C} { ss |-> [23,42], dl |-> [123,456] }
```
Iteration 4. Insert a new multi-collection since there is no match:
```
{A,B,C} { ss |-> [23,42], dl |-> [123,456] }
{D}     { ss |-> [89],    dl |-> [789]     }
```
Iteration 5. Merge with second multi collection, since the SSN is there:
```
{A,B,C} { ss |-> [23,42], dl |-> [123,456] }
{D,E}   { ss |-> [89],    dl |-> [432,789] }
```
So in each iteration (one for each collection), you must identify all multi-collections that have values in common with the collection you are processing, and merge all these together.

In general, if there are n collections each with a constant k number of attributes, then this algorithm will run in time O(nnk) = O(n²). The worst-case behaviour is exibited if all attribute values are distinct. When there is more sharing between attribute values, the time that it takes to insert and determine membership in the attribute value sets (like [23,42]) gets to be the dominant factor, so the attribute value sets should be efficient.

If you use optimal disjoint sets, then each Find or Merge operation will run in amortized time O(α(n)).

Thus, for each iteration there will be at most n multi-collections (the situation when no multi-collections have been merged so far). To integrate the new collection into the multi-collections, you will have to perform a Find operation on each of the multi-collections k sets to identify all multi-collections to be merged, which takes time bounded by O(nkα(n)). To merge the at most k multi-collections found this way takes O(k²α(n)).

So for all iteration the time is bounded by O(n(nkα(n)+k²α(n))) = O(n(nkα(n))) = O(n²kα(n)) = O(n²α(n)) since k is a constant.

Because α(n) for all practical purposes is also a constant, the total time is bounded by O(n²).
0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...