Sorting CouchDB Views By Value

前端 未结 7 1426
独厮守ぢ
独厮守ぢ 2020-12-04 08:52

I\'m testing out CouchDB to see how it could handle logging some search results. What I\'d like to do is produce a view where I can produce the top queries from the results

相关标签:
7条回答
  • 2020-12-04 09:03

    Based on Avi's answer, I came up with this Couchdb list function that worked for my needs, which is simply a report of most-popular events (key=event name, value=attendees).

    ddoc.lists.eventPopularity = function(req, res) {
      start({ headers : { "Content-type" : "text/plain" } });
      var data = []
      while(row = getRow()) {
        data.push(row);
      }
      data.sort(function(a, b){
        return a.value - b.value;
      }).reverse();
      for(i in data) {
        send(data[i].value + ': ' + data[i].key + "\n");
      }
    }
    

    For reference, here's the corresponding view function:

    ddoc.views.eventPopularity = {
      map : function(doc) {
        if(doc.type == 'user') {
          for(i in doc.events) {
            emit(doc.events[i].event_name, 1);
          }
        }
      },
      reduce : '_count'
    }
    

    And the output of the list function (snipped):

    165: Design-Driven Innovation: How Designers Facilitate the Dialog
    165: Are Your Customers a Crowd or a Community?
    164: Social Media Mythbusters
    163: Don't Be Afraid Of Creativity! Anything Can Happen
    159: Do Agencies Need to Think Like Software Companies?
    158: Customer Experience: Future Trends & Insights
    156: The Accidental Writer: Great Web Copy for Everyone
    155: Why Everything is Amazing But Nobody is Happy
    
    0 讨论(0)
  • 2020-12-04 09:05

    This came up on the CouchDB-user mailing list, and Chris Anderson, one of the primary developers, wrote:

    This is a common request, but not supported directly by CouchDB's views -- to do this you'll need to copy the group-reduce query to another database, and build a view to sort by value.

    This is a tradeoff we make in favor of dynamic range queries and incremental indexes.

    I needed to do this recently as well, and I ended up doing it in my app tier. This is easy to do in JavaScript:

    db.view('mydesigndoc', 'myview', {'group':true}, function(err, data) {
    
        if (err) throw new Error(JSON.stringify(err));
    
        data.rows.sort(function(a, b) {
            return a.value - b.value;
        });
    
        data.rows.reverse(); // optional, depending on your needs
    
        // do something with the data…
    });
    

    This example runs in Node.js and uses node-couchdb, but it could easily be adapted to run in a browser or another JavaScript environment. And of course the concept is portable to any programming language/environment.

    HTH!

    0 讨论(0)
  • 2020-12-04 09:08

    This is an old question but I feel it still deserves a decent answer (I spent at least 20 minutes on searching for the correct answer...)

    I disapprove of the other suggestions in the answers here and feel that they are unsatisfactory. Especially I don't like the suggestion to sort the rows in the applicative layer, as it doesn't scale well and doesn't deal with a case where you need to limit the result set in the DB.

    The better approach that I came across is suggested in this thread and it posits that if you need to sort the values in the query you should add them into the key set and then query the key using a range - specifying a desired key and loosening the value range. For example if your key is composed of country, state and city:

    emit([doc.address.country,doc.address.state, doc.address.city], doc);
    

    Then you query just the country and get free sorting on the rest of the key components:

    startkey=["US"]&endkey=["US",{}] 
    

    In case you also need to reverse the order - note that simple defining descending: true will not suffice. You actually need to reverse the start and end key order, i.e.:

    startkey=["US",{}]&endkey=["US"]
    

    See more reference at this great source.

    0 讨论(0)
  • 2020-12-04 09:12

    It is true that there is no dead-simple answer. There are several patterns however.

    1. http://wiki.apache.org/couchdb/View_Snippets#Retrieve_the_top_N_tags. I do not personally like this because they acknowledge that it is a brittle solution, and the code is not relaxing-looking.

    2. Avi's answer, which is to sort in-memory in your application.

    3. couchdb-lucene which it seems everybody finds themselves needing eventually!

    4. What I like is what Chris said in Avi's quote. Relax. In CouchDB, databases are lightweight and excel at giving you a unique perspective of your data. These days, the buzz is all about filtered replication which is all about slicing out subsets of your data to put in a separate DB.

      Anyway, the basics are simple. You take your .rows from the view output and you insert it into a separate DB which simply emits keyed on the count. An additional trick is to write a very simple _list function. Lists "render" the raw couch output into different formats. Your _list function should output

      { "docs":
          [ {..view row1...},
            {..view row2...},
            {..etc...}
          ]
      }
      

      What that will do is format the view output exactly the way the _bulk_docs API requires it. Now you can pipe curl directly into another curl:

      curl host:5984/db/_design/myapp/_list/bulkdocs_formatter/query_popularity \
       | curl -X POST host:5984/popularity_sorter/_design/myapp/_view/by_count
      
    5. In fact, if your list function can handle all the docs, you may just have it sort them itself and return them to the client sorted.

    0 讨论(0)
  • 2020-12-04 09:17

    I'm unsure about the 1 you have as your returned result, but I'm positive this should do the trick:

    emit([doc.hits, split[i]], 1);

    The rules of sorting are defined in the docs.

    0 讨论(0)
  • 2020-12-04 09:17

    The Link Retrieve_the_top_N_tags seems to be broken, but I found another solution here.

    Quoting the dev who wrote that solution:

    rather than returning the results keyed by the tag in the map step, I would emit every occurrence of every tag instead. Then in the reduce step, I would calculate the aggregation values grouped by tag using a hash, transform it into an array, sort it, and choose the top 3.

    As stated in the comments, the only problem would be in case of a long tail:

    Problem is that you have to be careful with the number of tags you obtain; if the result is bigger than 500 bytes, you'll have couchdb complaining about it, since "reduce has to effectively reduce". 3 or 6 or even 20 tags shouldn't be a problem, though.

    It worked perfectly for me, check the link to see the code !

    0 讨论(0)
提交回复
热议问题