问题
I am having a problem scheduling a cron job which requires scraping a website and storing it as part of the model (MOVIE) in the database.
The problem is that the model seems to get loaded before Procfile is executed.
How should I create a cron job which runs internally in the background and storing scraped information into the database? Here are my codes:
Procfile:
web: python manage.py runserver 0.0.0.0:$PORT
scheduler: python cinemas/scheduler.py
scheduler.py:
# More code above
from cinemas.models import Movie
from apscheduler.schedulers.blocking import BlockingScheduler
sched = BlockingScheduler()
@sched.scheduled_job('cron', day_of_week='mon-fri', hour=0, minutes=26)
def get_movies_playing_now():
global url_movies_playing_now
Movie.objects.all().delete()
while(url_movies_playing_now):
title = []
description = []
#Create BeatifulSoup Object with url link
s = requests.get(url_movies_playing_now, headers=headers)
soup = bs4.BeautifulSoup(s.text, "html.parser")
movies = soup.find_all('ul', class_='w462')[0]
#Find Movie's title
for movie_title in movies.find_all('h3'):
title.append(movie_title.text)
#Find Movie's description
for movie_description in soup.find_all('ul',
class_='w462')[0].find_all('p'):
description.append(movie_description.text.replace(" [More]","."))
for t, d in zip(title, description):
m = Movie(movie_title=t, movie_description=d)
m.save()
#Go to the next page to find more movies
paging = soup.find( class_='pagenating').find_all('a', class_=lambda x:
x != "inactive")
href = ""
for p in paging:
if "next" in p.text.lower():
href = p['href']
url_movies_playing_now = href
sched.start()
# More code below
cinemas/models.py:
from django.db import models
#Create your models here.
class Movie(models.Model):
movie_title = models.CharField(max_length=200)
movie_description = models.CharField(max_length=20200)
This is the error i am getting when the Job is ran.
2016-11-17T17:57:06.074914+00:00 app[scheduler.1]: Traceback (most recent call last): 2016-11-17T17:57:06.074931+00:00 app[scheduler.1]: File "cinemas/scheduler.py", line 2, in 2016-11-17T17:57:06.075058+00:00 app[scheduler.1]: import cineplex 2016-11-17T17:57:06.075060+00:00 app[scheduler.1]: File "/app/cinemas/cineplex.py", line 1, in 2016-11-17T17:57:06.075173+00:00 app[scheduler.1]: from cinemas.models import Movie 2016-11-17T17:57:06.075196+00:00 app[scheduler.1]: File "/app/cinemas/models.py", line 5, in 2016-11-17T17:57:06.075295+00:00 app[scheduler.1]: class Movie(models.Model): 2016-11-17T17:57:06.075297+00:00 app[scheduler.1]: File "/app/.heroku/python/lib/python3.5/site-packages/django/db/models/base.py", line 105, in new 2016-11-17T17:57:06.075414+00:00 app[scheduler.1]: app_config = apps.get_containing_app_config(module) 2016-11-17T17:57:06.075440+00:00 app[scheduler.1]: File "/app/.heroku/python/lib/python3.5/site-packages/django/apps/registry.py", line 237, in get_containing_app_config 2016-11-17T17:57:06.075585+00:00 app[scheduler.1]:
self.check_apps_ready() 2016-11-17T17:57:06.075586+00:00 app[scheduler.1]: File "/app/.heroku/python/lib/python3.5/site-packages/django/apps/registry.py", line 124, in check_apps_ready 2016-11-17T17:57:06.075703+00:00 app[scheduler.1]: raise AppRegistryNotReady("Apps aren't loaded yet.") 2016-11-17T17:57:06.075726+00:00 app[scheduler.1]: django.core.exceptions.AppRegistryNotReady: Apps aren't loaded yet.
Cron job works fine if I do not include Model objects. How should I run this job every day using Model objects without failing?
Thanks
回答1:
That's because you can't just import the Django packages, models, etc.
In order to work properly, the Django internals require initialization, that's triggered from manage.py
.
Rather than try and re-create all that myself, I always write long-running, non-web commands as a custom management command.
For example, if your app is cinemas
, you would:
- Create
./cinemas/management/commands/scheduler.py
. - In that file, create a sub-class
django.core.management.base.BaseCommand
(that sub-class must be calledCommand
) - In that class, override
handle()
. In your case, that's where you'd callsched.start()
- Your
Procfile
would then havescheduler: python manage.py scheduler
Hope that helps.
回答2:
You can solve the problem with adding the following lines to the top of your sceduler.py
import django
django.setup()
In the django documentation it says
If you’re using components of Django “standalone” – for example, writing a Python script which loads some Django templates and renders them, or uses the ORM to fetch some data – there’s one more step you’ll need in addition to configuring settings.
After you’ve either set DJANGO_SETTINGS_MODULE or called configure(), you’ll need to call django.setup() to load your settings and populate Django’s application registry. For example:
import django from django.conf import settings from myapp import myapp_defaults settings.configure(default_settings=myapp_defaults, DEBUG=True) django.setup() # Now this script or any imported module can use any part of Django it needs. from myapp import models
I set DJANGO_SETTINGS_MODULE as a config variable so didn't add it to my scheduler.
来源:https://stackoverflow.com/questions/40662013/issues-defining-cron-jobs-in-procfile-heroku-using-apscheduler-for-django-proj