Simple counter example using mapreduce in Google App Engine

后端 未结 2 1516
小蘑菇
小蘑菇 2021-02-06 02:31

I\'m somewhat confused with the current state of mapreduce support in GAE. According to the docs http://code.google.com/p/appengine-mapreduce/ reduce phase isn\'t supported yet,

2条回答
  •  深忆病人
    2021-02-06 02:57

    I'm providing here solution I figured out eventually using mapreduce from GAE (without reduce phase). If I had started from scratch I probably would have used solution provided by Drew Sears.

    It works in GAE python 1.5.0

    In app.yaml I added the handler for mapreduce:

    - url: /mapreduce(/.*)?
      script: $PYTHON_LIB/google/appengine/ext/mapreduce/main.py
    

    and the handler for my code for mapreduce (I'm using url /mapred_update to gather the results produced by mapreduce):

    - url: /mapred_.*
      script: mapred.py
    

    Created mapreduce.yaml for processing Car entities:

    mapreduce:
    - name: Color_Counter
      params:
      - name: done_callback
        value: /mapred_update
      mapper:
        input_reader: google.appengine.ext.mapreduce.input_readers.DatastoreInputReader
        handler: mapred.process
        params:
        - name: entity_kind
          default: models.Car
    

    Explanation: done_callback is an url that is called after mapreduce finishes its operations. mapred.process is a function that process individual entity and update counters (it's defined in mapred.py file). Model Car is defined in models.py

    mapred.py:

    from models import CarsByColor
    from google.appengine.ext import db
    from google.appengine.ext.mapreduce import operation as op
    from google.appengine.ext.mapreduce.model import MapreduceState
    
    from google.appengine.ext import webapp
    from google.appengine.ext.webapp.util import run_wsgi_app
    
    def process(entity):
        """Process individual Car"""
        color = entity.color
        if color:
            yield op.counters.Increment('car_color_%s' % color)
    
    class UpdateCounters(webapp.RequestHandler):
        """Create stats models CarsByColor based on the data 
        gathered by mapreduce counters"""
        def post(self):
            """Called after mapreduce operation are finished"""
            # Finished mapreduce job id is passed in request headers
            job_id = self.request.headers['Mapreduce-Id']
            state = MapreduceState.get_by_job_id(job_id)
            to_put = []
            counters = state.counters_map.counters
            # Remove counter not needed for stats
            del counters['mapper_calls']
            for counter in counters.keys():
                stat = CarsByColor.get_by_key_name(counter)
                if not stat:
                    stat = CarsByColor(key_name=counter,
                                    name=counter)
                stat.value = counters[counter]
                to_put.append(stat)
            db.put(to_put)
    
            self.response.headers['Content-Type'] = 'text/plain'
            self.response.out.write('Updated.')
    
    
    application = webapp.WSGIApplication(
                                         [('/mapred_update', UpdateCounters)],
                                         debug=True)
    def main():
        run_wsgi_app(application)
    
    if __name__ == "__main__":
        main()            
    

    There is slightly changed definition of CarsByColor model compared to question.

    You can start the mapreduce job manually from url: http://yourapp/mapreduce/ and hopefully from cron (I haven't tested the cron yet).

提交回复
热议问题