efficient algorithm to compare similarity between sets of numbers?

前端 未结 12 673
甜味超标
甜味超标 2021-02-01 11:13

I have a large number of sets of numbers. Each set contains 10 numbers and I need to remove all sets that have 5 or more number (unordered) matches with any other set.

F

12条回答
  •  暖寄归人
    2021-02-01 11:56

    Another perfect job for a Signature Tree. Once again I'm aghast that there isn't a library out there that implements them. Let me know if you write one.

    From the abstract of the first paper in the search results above:

    We propose a method that represents set data as bitmaps (signatures) and organizes them into a hierarchical index, suitable for similarity search and other related query types. In contrast to a previous technique, the signature tree is dynamic and does not rely on hardwired constants. Experiments with synthetic and real datasets show that it is robust to different data characteristics, scalable to the database size and efficient for various queries.

提交回复
热议问题