I have an application where I have a number of sets. A set might be {4, 7, 12, 18} unique numbers and all less than 50.
I then have several data items: 1 {1,
This is not a real answer more an observation: this problem looks like it could be efficiently parallellized or even distributed, which would at least reduce the running time to O(n / number of cores)