Is there any way to have multiple tag search implemented in CouchDB? I have documents (posts) each with multiple tags. I need to find posts that have been tagged with an arbitra
I think the following should give you a slightly complicated but solid algorithm -- i.e. it does finds the first results fast, even if you have very many documents. It will probably not perform well in practice :(
Index the documents by each single tag and there document id:
[<some tag>, <document id>]
E.g. for the documents document
you get
['blue', 'docid1'] ['blue', 'docid2'] ['green', 'docid1'] ['red', 'docid1'] ['yellow', 'docid2']
Now for each tag you want to search for you open a parallel search starting at [tag, ...].
For each tag you maintain a current search position. If the docids at all your searches match, you found a match. If they do not match, try to skip to at least the highest document id via a range search. Repeat.
[It's basically a join.]
The skipping is theoretically fast: We have an index to find these documents. Practically, it's probably slow because of all the round trips to the server. It would be nice to be able to offload that algorithm to a function executed on the server. Is that possible?
In the more recent versions of CouchDB, you can POST to a view with a JSON document called keys
, which allows for multi-key lookup. The structure would look something like this:
{"keys": ["first_tag", "second_tag", "third_tag"]}
This could be POSTed to a view that you have that is emitting tags for its respective keys.
This and other querying options are documented here.
I have solved this problem creating a view with a recursive function. Here the gist https://gist.github.com/820412
So, as far as I understood the answer is NO. CouchDB can't query for documents having presence of multiple tags (workaround with lucene or mysql doesn't count, this way we lost some features of CouchDB). Sad news :(.
(having presence of multiple tags - having both A and B, not A or B)
UPD! It's possible but with limitations to only 2-3 tags.
http://wiki.apache.org/couchdb/EntityRelationship
Querying by multiple keys
Some applications need to view the intersection of entities that have multiple keys. In the example above, this would be a query for the contacts who are in both the "Friends" and the "Colleagues" groups. The most straight-forward way to handle this situation is to query for one of the keys, and then to filter by the rest of the keys on the client-side. If the key frequencies vary greatly, it may also be worthwhile to make an initial call to determine the key with the lowest frequency, and to use that to fetch the initial document list from the database.
If this is not a good option, it is possible to index the combinations of the keys, though the growth of the index for a given document will be exponential with the number of its keys. Still, for small-ish key sets, this is an option, since the keys can be ordered, and keys which are prefixes of a larger key can be omitted. For instance, for the key set [1 2 3] the possible key combinations are [1] [2] [3] [1 2] [1 3] [2 3] [1 2 3] However, the index need only contain the keys [3] [1 3] [2 3] [1 2 3] since (for example) the documents matching the keys [1 2] could be obtained with a query for startkey=[1,2,null] and endkey=[1,2,{}] The number of index entries will be 2^(n-1) number of keys.
A final option is to use a separate index, such as couchdb-lucene to help with such queries.
Actually tagging seems to be a very relational problem and does not play well with CouchDB's design. So I have decided to have one small database for tags on mysql and have the actual documents stored at CouchDB. This lets me get the best of both worlds. Although this technique has problems related to synchronization, searching on tags is an efficient operation on sql and the content is not too much to worry about replication or sharding. Thanks for all your answers.
One way of doing is as explained above by Ryan Duffield. Though it solves some of the queries but it will become unmanageable over the period of time. Otherway is to use Full Text Search which is not currently supported by CouchDB but there is an external plugin using Lucene. more here http://wiki.apache.org/couchdb/Full_text_search.