How do I return data from a deferred task in Google App Engine

问题

Original Question

I have a working version of my web application that I am trying to upgrade at the moment, and I'm running into the issue of having a task which takes too long to complete in during a single HTTP request. The application takes a JSON list from a JavaScript front end by an HTTP Post operation, and returns a sorted/sliced version of that list. As the input list gets longer, the sorting operation takes a much longer time to perform (obviously), so on suitably long input lists, I hit the 60 second HTTP request timeout, and the application fails.

I would like to start using the deferred library to perform the sort task, but I'm not clear on how to store/retrieve the data after I perform that task. Here is my current code:

class getLineups(webapp2.RequestHandler):
  def post(self):
    jsonstring = self.request.body
    inputData = json.loads(jsonstring)
    playerList = inputData["pList"]
    positions = ["QB","RB","WR","TE","DST"]

    playersPos = sortByPos(playerList,positions)
    rosters, playerUse = getNFLRosters(playersPos, positions)
    try:
      # This step is computationally expensive, it will fail on large player lists.
      lineups = makeLineups(rosters,playerUse,50000)

      self.response.headers["Content-Type"] = "application/json"
      self.response.out.write(json.dumps(lineups))
    except:
      logging.error("60 second timeout reached on player list of length:", len(playerList))
      self.response.headers["Content-Type"] = "text/plain"
      self.response.set_status(504)

app = webapp2.WSGIApplication([
  ('/lineup',getLineups),
], debug = True)

Ideally I would like to replace the entire try/except block with a call to the deferred task library:

deferred.defer(makeLineups,rosters,playerUse,50000)

But I'm unclear on how I would get the result back from that operation. I'm thinking I would have to store it in the Datastore, and then retrieve it, but how would my JavaScript front end know when the operation is complete? I've read the documentation on Google's site, but I'm still hazy on how to accomplish this task.

How I Solved It

Using the basic outline in the accepted answer, here's how I solved this problem:

def solveResult(result_key):
  result = result_key.get()

  playersPos = sortByPos(result.playerList, result.positions)
  rosters, playerUse = getNFLRosters(playersPos,result.positions)

  lineups = makeLineups(rosters,playerUse,50000)
  storeResult(result_key,lineups)

@ndb.transactional
def storeResult(result_key,lineups):
  result = result_key.get()
  result.lineups = lineups
  result.solveComplete = True
  result.put()

class Result(ndb.Model):
  playerList = ndb.JsonProperty()
  positions = ndb.JsonProperty()
  solveComplete = ndb.BooleanProperty()

class getLineups(webapp2.RequestHandler):
  def post(self):
    jsonstring = self.request.body
    inputData = json.loads(jsonstring)

    deferredResult = Result(
      playerList = inputData["pList"],
      positions = ["QB","RB","WR","TE","DST"],
      solveComplete = False
    )

    deferredResult_key = deferredResult.put()

    deferred.defer(solveResult,deferredResult_key)

    self.response.headers["Content-Type"] = "text/plain"
    self.response.out.write(deferredResult_key.urlsafe())

class queryResults(webapp2.RequestHandler):
  def post(self):
    safe_result_key = self.request.body
    result_key = ndb.Key(urlsafe=safe_result_key)

    result = result_key.get()
    self.response.headers["Content-Type"] = "application/json"

    if result.solveComplete:
      self.response.out.write(json.dumps(result.lineups))
    else:
      self.response.out.write(json.dumps([]))

The Javascript frontend then polls queryLineups URL for a fixed amount of time and stops polling if either the time limit expires, or it receives data back. I hope this is helpful for anyone else attempting to solve a similar problem. I have a bit more work to do to make it fail gracefully if things get squirrelly, but this works and just needs refinement.

回答1:

I'm not familiar with GAE, but this is a fairly generic question, so I can give you some advice.

Your general idea is correct, so I'm just going to expand on it. The workflow could look like this:

You get the request to create the lineups. You create a new entity in the datastore for it. It should contain an ID (you'll need it to retrieve the result later) and a status (PENDING|DONE|FAILED). You can also save the data from the request, if that's useful to you.
You defer the computation and return a response right away. The response will contain the ID of the task. When the computation is done, it will save the result of the task in the Datastore and update the status of the task. That result will contain the task ID, so that we can easily find it.
Once the frontend receives the ID, it starts polling for the result. Using setTimeout or setInterval you send requests with the task ID to the server (this is a separate endpoint). The server checks the status of the task, and returns the result if it's done (error if failed).
The frontend gets the data and stops polling.

回答2:

Normally you can't reply to the original request anymore since the context of that original request dissapears. Maybe, if you return from the request handler without replying and if somehow that doesn't kill the connection from the client and if you are somehow able to persist the handler object so that you can later restore it in another (internal) request and use the restored copy to reply from it to the original request... Kind of a long shot at best.

One option would be to split the operation into a sequence: - a 1st request starting the operation - subsequent one or more polling requests until the operation completes and the result is available

Another approach may be possible if the expensive operation is mainly executing on data available prior to when the operation is invoked. You could re-org the app logic so that partial results are computed as soon as the respective data becomes available, so that when the final operation is requested it only operates on pre-computed partial results. An analogy, if you want, would be Google search requests immediately receiving replies with data from pre-computed indexes instead of waiting for an actual web search to be performed.

回答3:

Well, first, it's already bad to let users wait for 1 minute until page loads. In general, user-facing HTTP requests should take no more than 1 second. Those 60 seconds that GAE gives -- is already too generous, for critical situations.

I have several suggestions, but I don't know your application to say what you need:

Precompute. Load, compute and store lineups value before user request it. For that you can utilize GAE Backend instances, which can run way longer than 60 seconds.
Do users really need that much data? Generally, if there's so much data that computer has problems sorting it -- it's already too much to show to user. Probably your users just need to see some small part of it (like top 10 players, or some aggregate statistics). Then improvement of algorithm used in makeLineups() will do the trick.
Defer. If you cannot do 1 or 2, then your option is to defer the computation to Task API. For that your frontend should:
Enqueue a task using Task Queue: https://cloud.google.com/appengine/docs/python/taskqueue/
- Open channel to user using Channel API: https://cloud.google.com/appengine/docs/python/channel/
- Save the channel_id for that user to Datastore.
- Finish the call. On UI show user a message like "please wait, we're crunching down the numbers".
- At the same time, GAE backend executes the task you enqueued. The task computes value of makeLineups(). Once done, the task will take channel_id from Datastore and send there the computed value of lineups.
- User frontend receives the value and makes user happy.
Instead of Task API there's new Background Threads that may be easier and better for your case: https://cloud.google.com/appengine/docs/python/modules/#Python_Background_threads Basically, instead of enqueueing a task, you call'd background_thread.BackgroundThread(), the rest stays the same. UPDATE This will work better only with backend modules (basic or manual scaling, not automatic). On Frontend (default) modules, custom threads cannot outlive HTTP request, and hence also limited to 60s.

Let me know if that helps.

来源：https://stackoverflow.com/questions/34076128/how-do-i-return-data-from-a-deferred-task-in-google-app-engine

标签

javascript

python

google-app-engine

google-app-engine-python