I have a large number of user IDs (integers), potentially millions. These users all belong to various groups (sets of integers), such that there are on the order of 10 million
One way is to see your problem as a metric space radius search problem where the distance function is the number of not matching entries and the radius is r = max(number of elements in sets) - number of equal
. Filtering of the found elements is needed to see that enough values are in the Set. So unless someone comes up with a metric function that can be used directly this solutions has to much constraints.
One of the data structures for metric search is the BK-Tree that can be used for string similarity searches.
Candidates for your problem are VP-tree and M-Trees.
The worst case for metric tree is O(n^2) when you're searching for a distance > m (maximum number of elements in the sets) as you build up the tree in O(log n * n) and search in O(n^2).
Apart for that the actual runtime complexity depends on the abbility to prune subtrees of the metric tree while executing the search. In a metric tree a subtree can be skipped if the distance of the pivot element to the search element is larger than the radius of the pivot element (which is at least the maximum distance of a ancestores to the pivot element). If your entry sets are rather disjunct the overall runtime will be dominated by the build-up time of the metric tree O(log n * n).