I\'m somewhat confused with the current state of mapreduce support in GAE. According to the docs http://code.google.com/p/appengine-mapreduce/ reduce phase isn\'t supported yet,
You don't really need a reduce phase. You can accomplish this with a linear task chain, more or less as follows:
def count_colors(limit=100, totals={}, cursor=None):
query = Car.all()
if cursor:
query.with_cursor(cursor)
cars = query.fetch(limit)
for car in cars:
try:
totals[car.color] += 1
except KeyError:
totals[car.color] = 1
if len(cars) == limit:
cursor = query.cursor()
return deferred.defer(count_colors, limit, totals, cursor)
entities = []
for color in totals:
entity = CarsByColor(key_name=color)
entity.cars_num = totals[color]
entities.append(entity)
db.put(entities)
deferred.defer(count_colors)
This should iterate over all your cars, pass a query cursor and a running tally to a series of ad-hoc tasks, and store the totals at the end.
A reduce phase might make sense if you had to merge data from multiple datastores, multiple models, or multiple indexes in a single model. As is I don't think it would buy you anything.
Another option: use the task queue to maintain live counters for each color. When you create a car, kick off a task to increment the total for that color. When you update a car, kick off one task to decrement the old color and another to increment the new color. Update counters transactionally to avoid race conditions.