CouchDB - filter latest log per logged instance from a list

自古美人都是妖i 提交于 2020-01-13 17:05:05

问题


I could use some help filtering distinct values from a couchdb view. I have a database that stores logs with information about computers. Periodically new logs for a computer are written to the db.

A bit simplified i store entries like these:

{
   "name": "NAS",
   "os": "Linux",
   "timestamp": "2011-03-03T16:26:39Z",
}
{
   "name": "Server1",
   "os": "Windows",
   "timestamp": "2011-02-03T19:31:31Z",
}
{
   "name": "NAS",
   "os": "Linux",
   "timestamp": "2011-02-03T18:21:29Z",
}

So far i am struggling to filter this list by distinct entries. What i'd like to receive is the latest logfile for each device.

I have a view like this:

function(doc) {
    emit([doc.timestamp,doc.name], doc);
}

Im querying this view with python (couchdbkit) and the best solution i came up with so far looks like this:

def get_latest_logs(cls):
    unique = []
    for log in cls.view("logs/timestamp", descending=True):
        if log.name not in unique_names:
            unique.append(log)
    return unique

Okay ... this works. But i have the strong feeling, that this is not the best solution as python needs to iterate the whole list of logfiles (which could become quite long).

I guess i need a reduce function but i couldn't really find any examples or explanations that i could adapt to my problem.

So, what i am looking for is a (pure couchdb) view, that only spits out the latest log for a given device.


回答1:


Here is what I do. This is borderline CouchDB abuse however I have had much success.

Usually, reduce will compute a sum, or a count, or something like that. However, think of reduce as an elimination tournament. Many values go in. Only one comes out. A reduction! Repeat over and over and you have the ultimate winner (a re-reduction). In this case, the log with the latest timestamp is the winner.

Of course, welterweights can't fight heavyweights. There have to be leagues and weight classes. It only makes sense for certain documents to do battle with certain other similar documents. That is exactly what the reduce group parameter will do. It will ensure that only evenly-matched gladiators enter the steel cage in our bloodsport. (Coffee is kicking in.)

First, emit all logs keyed by device. The value emitted is simply a copy of the document.

function(doc) {
    emit(doc.name, doc);
}

Next, write a reduce function to return the latest timestamp of all given values. If you see a fight between two gladiators from different leagues (two logs from different systems), stop the fight! Something went wrong (somebody queried without the correct group value).

function(keys, vals, re) {
    var challenger, winner = null;
    for(var a = 0; a < vals.length; a++) {
        challenger = vals[a];
        if(!winner) {
            // The title is unchallenged. This value is the winner.
            winner = challenger;
        } else {
            // Fight!
            if(winner.name !== challenger.name) {
                // Stop the fight! He's gonna kill him!
                return null; // With a grouping query, this will never happen.
            } else if(winner.timestamp > challenger.timestamp) {
                // The champ wins! (Nothing to do.)
            } else {
                // The challenger wins!
                winner = challenger;
            }
        }
    }

    // Today's champion lives to fight another day.
    return winner;
}

(Note, the timestamp comparison is probably wrong. You will have to convert to a Date probably.)

Now, when you query a view with ?group=true, then CouchDB will only reduce (find the winner between) values with the same key, which is your machine name.

(You can also emit an array as a key, which gives a bit more flexibility. You could emit([doc.name, doc.timestamp], doc) instead. So you can see all logs by system with a query like ?reduce=false&startkey=["NAS", null]&endkey=["NAS", {}] or you could see latest logs by system with ?group_level=1.

Finally, the "stop the fight" stuff is optional. You could simply always return the document with the latest timestamp. However, I prefer to keep it there because in similar situations, I want to see if I am map-reducing incorrectly, and a null reduce output is my big clue.



来源:https://stackoverflow.com/questions/5198023/couchdb-filter-latest-log-per-logged-instance-from-a-list

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!