I need to create an match-finder system for some data set, as follows:
There is a set of objects, each identified by a string ObjectID
.
Each obj
What you are trying to do here is an inverted index.
For each column, have it map to a "set". Then, you can intersect the sets to get the result.
So, APPLE: RED ROUND FRUIT
would map to the following inserts:
SADD p1:RED APPLE
SADD p2:ROUND APPLE
SADD p3:FRUIT APPLE
Then, let's say I want to query for * ROUND FRUIT
, I would do:
SINTER p2:ROUND p3:FRUIT
This command is taking the intersection of the items in the p2:ROUND
set and the p3:FRUIT
set. This will return all the items that are ROUND
and FRUIT
, not caring what p1
is.
Some other examples:
SMEMBERS p1:GREEN
SINTER p1:RED p2:ROUND p3:FRUIT
SUNION p1:RED p1:GREEN
My above answer is going to use some computation power because the intersection operation is O(N*M)
. Here is a way of doing it that is more memory intensive, but will have faster retrieval because it effectively precomputes the indexes.
For each combination of properties, make a key that stores a set:
So, APPLE: RED ROUND FRUIT
would map to the following inserts:
SADD RED:ROUND:FRUIT APPLE
SADD :ROUND:FRUIT APPLE
SADD RED::FRUIT APPLE
SADD RED:ROUND: APPLE
SADD RED:: APPLE
SADD :ROUND: APPLE
SADD ::FRUIT APPLE
SADD ::: APPLE
Then, to query, you simply access the respective key. For example, * ROUND FRUIT
would simply be
SMEMBERS :ROUND:FRUIT
Obviously, this doesn't scale well at all in terms of memory when you have many dimensions, but it will be extremely snappy to retrieve results.