I\'m trying to detect the \"trend\" of a value in a collection.
Let\'s say I have the following:
{ created_at: 2014-12-01, value:1015 }
{ created_at: 201
Presuming these are actual dates according to your comments on "aggregate per minute" the only real way to do this in a single pass is using mapReduce. The key here is that mapReduce can store a global variable and therefore "track" your last result in order to determine the "difference" between each aggregated record
db.collection.mapReduce(
function() {
// Round date to the minute
var key = this.created_at.valueOf()
- ( this.created_at.valueOf() % ( 1000 * 60 ) );
emit( key, { "average": this.value } );
},
function(key,values) {
values = values.map(function(i) { return i.average });
var result = {
"average": Math.floor(Array.avg(values))
};
return result;
},
{
"out": { "inline": 1 },
"scope": { "lastAvg": 0 },
"finalize": function(key,value) {
value.diff = ( lastAvg == 0 ) ? 0 : value.average - lastAvg;
lastAvg = value.average;
return value;
}
}
)
Alternately you can "post-process" as has been mentioned and do the same thing in your client code to calculate the difference as you iterate the cursor with a similar scoped variable. As a shell example:
var lastAvg = 0;
db.collection.aggregate([
{ "$group": {
"_id": { "$subtract": [
{ "$subtract": [ "$created_date", new Date(0) ] },
{ "$mod": [
{ "$subtract": [ "$created_date", new Date(0) ] },
1000 * 60
]}
]},
"average": { "$avg": "$value" }
}},
{ "$sort": { "_id": 1 } }
]).forEach(function(doc) {
doc.average = Math.floor(doc.average);
doc.diff = ( lastAvg == 0 ) ? 0 : doc.average - lastAvg;
lastAvg = doc.average;
printjson(doc);
})
In both cases there I am using the date math principles in order to convert the date object into a unix/epoch timestamp representation as a number which is rounded out to it's nearest floor by minute. With the aggregation framework you could alternately use the date aggregation operators to extract the date parts for grouping.
In either case it's really simple to re-cast that as a Date
object where required either internally with .mapReduce()
or in post processing using .aggregate()
.
So in wrap up, you can either use the "global scope" functionality of mapReduce or you can just process the resulting cursor from aggregate in order to work out the differences between each grouping in results.
Rough outline: I would calculate the average for the ten minute period:
> var avgCursor = db.sensor_readings.aggregate([
{ "$match" : { "created_at" : { "$gt" : ten_minutes_ago, "$lte" : now } } }
{ "$group" : { "_id" : 0, "average" : { "$avg" : "$value" } } }
]}
> var avgDoc = avgCursor.toArray()[0]
> avgDoc
{ "_id" : 0, "average" : 23 }
Then I would store it in another collection:
> db.sensor_averages.insert({ "start" : ten_minutes_ago, "end" : now, "average" : avgDoc.average })
Finally, recall the two averages you need to compute the difference, and compute it:
> var diffCursor = db.sensor_averages.find({ "start" : { "$gte" : twenty_minutes_ago } }).sort({ "start" : -1 })
> var diffArray = diffCursor.toArray()
> var difference = diffArray[0].average - diffArray[1].average
You could also skip the periodic aggregations and instead keep a running average updated in sensor_averages
, jumping to a new doc every 10 minutes. At the beginning of each 10 minute period, insert into sensor_averages
a doc
{
"start" : now,
"svalues" : 0,
"nvalues" : 0
}
then on each insert of a sensor_reading
document for the next ten minutes, also update the sensor_averages
doc:
db.sensor_averages.update(
{ "start" : now_rounded_to_the_ten_minute_boundary },
{ "$inc" : { "svalues" : value, "nvalues" : 1 } }
)
Then, when you want the difference between averages, recall the appropriate two docs, divide svalues
by nvalues
to get the average, and subtract.