Datastore solution for tag search

前端 未结 3 1054
不知归路
不知归路 2021-02-06 12:23

I\'ve got millions of items ordered by a precomputed score. Each item has many boolean attributes. Let says that there is about ten thousand possible attributes totally, each

相关标签:
3条回答
  • 2021-02-06 13:07

    Redis would be a perfect candidate for

    • "the top n items" for "millions of items ordered by score"

    Redis has a built in data structure that you can start from: Sorted Set => every member of a Sorted Set is associated with score. Which for example can be ranked by score with ZRANGEBYSCORE:

    ZRANGEBYSCORE key min max [WITHSCORES] [LIMIT offset count]
    

    I encourage you to look at Sorted Set commands and get a feel for Redis, as your problem (as it is stated) asks for it. You may of course keep as many attributes as you'd like within a single Set element.


    As far as MongoDB, since you mentioned millions, unless you can bent incremental queries to work for your problem, I would not expect a sub second response.

    As @nickdos mentioned Solr Relevancy is a quite powerful feature, but the number of attributes will be a problem, since it would need to keep all this attributes in memory for each item. Although a dozen for each may not be that bad => just try and see.

    0 讨论(0)
  • 2021-02-06 13:17

    This is exactly what Mongo can deal with. The fact that your attributes are boolean type helps here. A possible schema is listed below:

    [
        {
            true_tags:[attr1, attr2, attr3, ...],
            false_tags: [attr4, attr5, attr6, ...]
        },
    ]
    

    Then we can index on true_tags and false_tags. And it should be efficient to search with $in, $all, ... query operators.

    0 讨论(0)
  • 2021-02-06 13:20

    Mongodb can handle what you want, if you stored your objects like this

    { score:2131, attributes: ["attr1", "attr2", "attr3"], ... }
    

    Then the following query will match all the items that have att1 and attr2

    c = db.mycol.find({ attributes: { $all: [ "attr1", "attr2" ] } })
    

    but this won't match it

    c = db.mycol.find({ attributes: { $all: [ "attr1", "attr4" ] } })
    

    the query returns a cursor, if you want this cursor to be sorted, then just add the sort parameters to the query like so

    c = db.mycol.find({ attributes: { $all: [ "attr1", "attr2" ] }}).sort({score:1})
    

    Have a look at Advanced Queries to see what's possible.

    Appropriate indexes can be setup as follows

    db.mycol.ensureIndex({attributes:1, score:1})
    

    And you can get performance information using

    db.mycol.find({ attributes: { $all: [ "attr1" ] }}).explain()
    

    Mongo explains how many objects were scanned, how long the operation took and various other statistics.

    0 讨论(0)
提交回复
热议问题