First of, I am new to Python. It\'s irrelevant to the question, but I have to mention it.
I am creating an crawler as my first project, to understand how things work in
Ok, first of I'd like to thank @MikeMcKerns for his comment... So there are lots of changes to my script, because I wanted different approach, but in the end it comes down to these important changes.
My init.py
now looks that much cleaner...
from scraper.Crawl import Crawl
if __name__ == '__main__':
Crawl()
My method inside of scraper.Crawl
class, for download_lesson
, now looks like this...
def download_lesson(self, lesson):
response = requests.get(lesson['link'], stream=True)
chunk_size = 1024
progress = tqdm(
total=int(response.headers['Content-Length']),
unit='B',
unit_scale=True
)
with open(lesson['file'], 'wb') as file:
for chunk in response.iter_content(chunk_size=chunk_size):
progress.update(len(chunk))
file.write(chunk)
progress.close()
And finally, I have a method dedicated to multiprocessing, which looks like this:
def begin_processing(self):
pool = ThreadPool(nodes=Helper.config('threads'))
for course in self.course_data:
pool.map(self.download_lesson, course['lessons'])
print(
'Course "{course_title}" has been downloaded, with total of {lessons_amount} lessons.'.format(
course_title=course['title'],
lessons_amount=len(course['lessons'])
)
)
So as you can tell, I made some major changes to my class, but most importantly I had to add this bit to my init.py
if __name__ == '__main__':
And secondly, I had to use what @MikeMcKerns suggested me to take a look at:
from pathos.threading import ThreadPool
So with those changes, I finally got everything working as I needed. Here's a quick screenshot.
Even tho, I still have no clues why pathos.multiprocessing
is making tqdm
progress very buggy, I managed to solve my problem thanks to the suggestion of Mike. Thank you!