I have an application where I have a number of sets. A set might be
{4, 7, 12, 18}
unique numbers and all less than 50.
I then have several data items:
1 {1,
You could use inverted index of your data items. For your example
1 {1, 2, 4, 7, 8, 12, 18, 23, 29} 2 {3, 4, 6, 7, 15, 23, 34, 38} 3 {4, 7, 12, 18} 4 {1, 4, 7, 12, 13, 14, 15, 16, 17, 18} 5 {2, 4, 6, 7, 13, 15}
the inverted index will be
1: {1, 4} 2: {1, 5} 3: {2} 4: {1, 2, 3, 4, 5} 5: {} 6: {2, 5} ...
So, for any particular set {x_0, x_1, ..., x_i} you need to intersect sets for x_0, x_1 and others. For example, for the set {2,3,4} you need to intersect {1,5}
with {2}
and with {1,2,3,4,5}
. Because you could have all your sets in inverted index sorted, you could intersect sets in min of lengths of sets that are to be intersected.
Here could be an issue, if you have very 'popular' items (as 4 in our example) with huge index set.
Some words about intersecting. You could use sorted lists in inverted index, and intersect sets in pairs (in increasing length order). Or as you have no more than 50K items, you could use compressed bit sets (about 6Kb for every number, fewer for sparse bit sets, about 50 numbers, not so greedily), and intersect bit sets bitwise. For sparse bit sets that will be efficiently, I think.