Data structure for matching sets

前端 未结 13 1145
有刺的猬
有刺的猬 2021-02-02 00:14

I have an application where I have a number of sets. A set might be
{4, 7, 12, 18}
unique numbers and all less than 50.

I then have several data items:
1 {1,

13条回答
  •  臣服心动
    2021-02-02 00:48

    I see another solution which is dual to yours (i.e., testing a data item against every set) and that is using a binary tree where each node tests whether a specific item is included or not.

    For instance if you had the sets A = { 2, 3 } and B = { 4 } and C = { 1, 3 } you'd have the following tree

                          _NOT_HAVE_[1]___HAVE____
                          |                      |            
                    _____[2]_____          _____[2]_____
                    |           |          |           |
                 __[3]__     __[3]__    __[3]__     __[3]__
                 |     |     |     |    |     |     |     |
                [4]   [4]   [4]   [4]  [4]   [4]   [4]   [4]
                / \   / \   / \   / \  / \   / \   / \   / \
               .   B .   B .   B .   B    B C   B A   A A   A
                                                C     B C   B
                                                            C
    

    After making the tree, you'd simply need to make 50 comparisons---or how ever many items you can have in a set.

    For instance, for { 1, 4 }, you branch through the tree : right (the set has 1), left (doesn't have 2), left, right, and you get [ B ], meaning only set B is included in { 1, 4 }.

    This is basically called a "Binary Decision Diagram". If you are offended by the redundancy in the nodes (as you should be, because 2^50 is a lot of nodes...) then you should consider the reduced form, which is called a "Reduced, Ordered Binary Decision Diagram" and is a commonly used data-structure. In this version, nodes are merged when they are redundant, and you no longer have a binary tree, but a directed acyclic graph.

    The Wikipedia page on ROBBDs can provide you with more information, as well as links to libraries which implement this data-structure for various languages.

提交回复
热议问题