Given a list of objects with multiple attributes I need to find the list of sets created by a union of all intersecting subsets.
Specifically these are Person object
First, is there some inherent hierarchy in identifiers, and do contradicting identifiers of a higher sort cancel out the same identifier of a lower sort? For example, if A and B have the same SSN, B and C have the same DLN, and C and D have the same SSN which does not match A and B's SSN, does that mean that there are two groups or one?
Assuming contradictions don't matter, you are dealing with equivalence classes, as user 57368 (unknown Google) states. For equivalence classes, people often turn to the Union-find structure. As for how to perform these unions, it's not immediately trivial because I assume you don't have the direct link A-B when both A and B have the same SSN. Instead, our sets will consist of two kinds of elements. Each (attribute type, attribute value) = attribute
pair is an element. You also have elements corresponding to object
s. When you iterate through the list of attributes for an object, perform the union (object, attribute)
.
One of the important features of the Union-find data structure is that the resulting structure represents the set. It lets you query "What set is A in?" If this is not enough, let us know and we can improve the result.
But the most important feature is that the algorithm has something which resembles constant-time behavior for each union and query operation.