What's the correct approach for a Twitter Application on Google App Engine?

问题

I am trying to develop a Twitter App on Google App Engine. The app basically collects all tweets from a Twitter user's and his/her followers and their followers and so on. It typically collects 500 tweets per run per user and then inserts the data for the user into the database.

The tweet collection process has to be done every hour. Currently, I am using cron jobs for doing this. But it gives a lot of Deadline exceeded errors, even for one user, which is not a good sign. I am using Python. So I wanted to know what should I use for this? I have searched on the web and came to know that task queues along with cron can be used. But I have no idea how to do that. I will be very thankful if someone can help me with that. Also is there any other method/approach which I can use?

回答1:

To avoid DeadlineExceededExceptions, use multiple Deferred Push Task Queues. With Task Queues, it's easier to break up several tasks into smaller units of work, which prevents any individual task from exceeding the 10 minute threshold allocated to Task Queues.

With the Task Queue API, applications can perform work outside of a user request, initiated by a user request. If an app needs to execute some background work, it can use the Task Queue API to organize that work into small, discrete units, called tasks. The app adds tasks to task queues to be executed later.

Deferred Task Queues are Push Task Queues that are essentially scheduled tasks that have a predetermined time for when they should fire. Here is a short sample of how to create a Deferred Task:

import logging

from google.appengine.ext import deferred

  def do_something_expensive(a, b, c=None):
      logging.info("Fetching Twitter feeds!")
      # Fetch the Twitter data here


# Somewhere else - Pass in parameters needed by the Twitter API
deferred.defer(do_something_expensive, "BobsTwitterParam1", "BobsTwitterParam2", c=True)
deferred.defer(do_something_expensive, "BobsFriendTwitterParam1", "BobsFriendTwitterParam2", c=True)

Your process of fetching data from Twitter users is recursive by nature, since you're fetching data for followers of followers and so forth, and this task as a single process can be quite expensive and would likely exceed the threshold.

A task must finish executing and send an HTTP response value between 200–299 within 10 minutes of the original request. This deadline is separate from user requests, which have a 60-second deadline. If your task's execution nears the limit, App Engine raises a DeadlineExceededError (from the module google.appengine.runtime) that you can catch to save your work or log progress before the deadline passes. If the task failed to execute, App Engine retries it based on criteria that you can configure.

However, if you separate each Twitter user into a completely separate Task, then each task only runs for as long as it takes to fetch the Twitter results for a single user. Not only is this more efficient, but if there is a problem fetching one of the user's data, only that task would fail while the others should continue to execute.

In other words, don't try to fetch all of the data in a single Task.

Alternatively, if in the unlikely event or for whatever reason these tasks should exceed the 10 minute threshold, look into Backends.

来源：https://stackoverflow.com/questions/10676264/whats-the-correct-approach-for-a-twitter-application-on-google-app-engine

标签

python

google-app-engine

task-queue