问题
I built a scraping module "scraper.py" that also has the ability to download file and I imported this module into django views. Issue is that in the scraper.py, this " __name__='__main__
" is included where the multiprocessing pool is, so when I import the module and try to run it, it doesn't work because it isn't the main.
This is the script(scraper.py) that uses the pool method.
def download(self, url):
response = self._is_downloadable(url)
if response:
name = response.headers.get('content-disposition')
fname = re.findall('filename=(.+)', name)
if len(fname) != 0:
filename = fname[0]
filename = filename.replace("\"", "")
print(filename)
else :
filename = "Lecture note"
with open(filename, 'wb') as files:
for chunk in response.iter_content(100000):
files.write(chunk)
def download_course_file(self, course):
username = self._login_data["username"]
p = Path(f"{username}-{course}.txt").exists()
if not p:
self.get_download_links(course)
statime = time.time()
if __name__ == "__main__":
with Pool() as p:
with open(f"{username}-{course}.txt", "r") as course_link:
data = course_link.read().splitlines(False)[::2]
p.map(self.download, data)
print(data)
print(f"Process done {time.time()-statime}")
This module is imported in the views and then ran as
import scraper
def download_course(request, id):
course = course = get_object_or_404(Course, id=id)
course_name = (course.course_name)[:6]
person, error = create_session(request)
if "invalid" in error:
data = {"error":error}
return JsonResponse(data)
person.download_course_file(course_name)
data = {"success":"Your notes are being downloaded"}
return JsonResponse(data)
PS: create_session is a function for initialising the scraper object with a username and password.
Is there a workaround for this name statement and even if there isn't, can't I remove it when I am deploying to a server as long as the server don't use windows as its OS.
来源:https://stackoverflow.com/questions/61566737/multiprocessing-with-django-when-importing-a-external-module