Consider I have a website where I\'ve got a bunch of articles and people can vote on the articles they like.
I want to be able to query to get the articles with the most
The common way to track counts for votes overall would be to keep the number of votes in the post document and to update it atomically when pushing a new value to the votes array.
Since it's a single update, you are guaranteed that the count will match the number of elements in the array.
If the number of aggregations is fixed and the site is very busy you could extend this paradigm and increment additional counters, like one for month, day and hour, but that could get out of hand very quickly. So instead you could use the new Aggregation Framework (available in 2.1.2 dev release, will be in production in release 2.2. It is simpler to use than Map/Reduce and it will allow you to do the calculations you want very simply especially if you take care to store your vote dates as ISODate() type.
Typical pipeline for aggregation query for top vote getters this month might look something like this:
today = new Date();
thisMonth = new Date(today.getFullYear(),today.getMonth());
thisMonthEnd = new Date(today.getFullYear(),today.getMonth()+1);
db.posts.aggregate( [
{$match: { "Votes.votedate": {$gte:thisMonth, $lt:thisMonthEnd} } },
{$unwind: "$Votes" },
{$match: { "Votes.votedate": {$gte:thisMonth, $lt:thisMonthEnd} } },
{$group: { _id: "$title", votes: {$sum:1} } },
{$sort: {"votes": -1} },
{$limit: 10}
] );
This limits the input to the pipeline to posts that have votes by matching vote dates to the month you are counting, "unwinds" the array to get one document per vote and then does a "group by" equivalent summing up all votes for each title (I'm assuming title is unique). It then sorts descending by number of votes and limits the output to first ten.
You also have the ability to aggregate votes by day (for example) for that month to see which days are most active for voting:
db.posts.aggregate( [
{$match: { "Votes.votedate": {$gte:thisMonth, $lt:thisMonthEnd} } },
{$unwind: "$Votes" },
{$match: { "Votes.votedate": {$gte:thisMonth, $lt:thisMonthEnd} } },
{$project: { "day" : { "$dayOfMonth" : "$Votes.votedate" } } },
{$group: { _id: "$day", votes: {$sum:1} } },
{$sort: {"votes": -1} },
{$limit: 10}
] );
The schema you choose depends largely on your use-case..If you are expecting a lot of votes/comments and want to process them independently of the post they belong to, you might keep them in a separate collection with postID as the 'foriegn key'..However, if you want to load all the votes when you load a particular post and the votes in themselves don't have any meaning without the post that houses them, then go for the embedding (in your case, the first) approach.