I have a large number of user IDs (integers), potentially millions. These users all belong to various groups (sets of integers), such that there are on the order of 10 million
If the vast majority of intersections are 0, that means the number of non-empty intersections is relatively small. Give this a try:
map, int>
n*(n-1)/2
entries of that map, where n is the number of sets to which the user belongs.It will use more memory than the simple approach of computing every intersection. In fact it will run up against what's feasible: if each set on average intersects with just 10 others, perhaps in very small intersections, then the map needs 50M entries, which is starting to be a lot of RAM. It's also woefully cache-unfriendly.
It might be faster than doing all the set-intersections, because the O(n^2) terms relate to the number of non-empty intersections and the number of groups to which each user belongs, rather than to the number of sets.
Parallelizing isn't trivial, because of the contention on the giant map. However, you can shard that into a map for each thread, and periodically give one thread a new, empty, map and add the results-so-far into the total results. The different threads then run completely independently most of the time, each given a list of users to process.