Fastest way to perform subset test operation on a large collection of sets with same domain

后端未结

关注

 6  719

名媛妹妹 2021-02-10 02:26

Assume we have trillions of sets stored somewhere. The domain for each of these sets is the same. It is also finite and discrete. So each set may be stored as a bit field (eg: 0

6条回答

不思量自难忘° (楼主)

2021-02-10 02:56

Depending on the cardinality of the set from which all the sets are drawn, one option might be to build an inverted index mapping from elements to the sets that contain them. Given a set Y, you could then find all sets that have Y as a subset by finding all of the sets that contain each element individually and computing their intersection. If you store the lists in sorted order (for example, by numbering all the sets in your database with values 0, 1, etc.) then you should be able to compute this intersection fairly efficiently, assuming that no one element is contained in too many sets.

0 讨论(0)

查看其它6个回答
发布评论:

提交评论
- 加载中...