How to change position of progress bar – multiprocessing

后端 未结 1 515
南笙
南笙 2021-01-23 04:15

First of, I am new to Python. It\'s irrelevant to the question, but I have to mention it.

I am creating an crawler as my first project, to understand how things work in

相关标签:
1条回答
  • 2021-01-23 04:41

    Ok, first of I'd like to thank @MikeMcKerns for his comment... So there are lots of changes to my script, because I wanted different approach, but in the end it comes down to these important changes.

    My init.py now looks that much cleaner...

    from scraper.Crawl import Crawl
    
    if __name__ == '__main__':
        Crawl()
    

    My method inside of scraper.Crawl class, for download_lesson, now looks like this...

    def download_lesson(self, lesson):
    
        response = requests.get(lesson['link'], stream=True)
        chunk_size = 1024
    
        progress = tqdm(
            total=int(response.headers['Content-Length']),
            unit='B',
            unit_scale=True
        )
    
        with open(lesson['file'], 'wb') as file:
            for chunk in response.iter_content(chunk_size=chunk_size):
                progress.update(len(chunk))
                file.write(chunk)
    
        progress.close()
    

    And finally, I have a method dedicated to multiprocessing, which looks like this:

    def begin_processing(self):
        pool = ThreadPool(nodes=Helper.config('threads'))
    
        for course in self.course_data:
            pool.map(self.download_lesson, course['lessons'])
            print(
                'Course "{course_title}" has been downloaded, with total of {lessons_amount} lessons.'.format(
                    course_title=course['title'],
                    lessons_amount=len(course['lessons'])
                )
            )
    

    So as you can tell, I made some major changes to my class, but most importantly I had to add this bit to my init.py

    if __name__ == '__main__':
    

    And secondly, I had to use what @MikeMcKerns suggested me to take a look at:

    from pathos.threading import ThreadPool

    So with those changes, I finally got everything working as I needed. Here's a quick screenshot.

    Even tho, I still have no clues why pathos.multiprocessing is making tqdm progress very buggy, I managed to solve my problem thanks to the suggestion of Mike. Thank you!

    0 讨论(0)
提交回复
热议问题