问题
I asked a question here about multiprocessing a few days ago, and one user sent me the answer that you can see below. Only problem is that this answer worked on his machine and does not work on my machine.
I have tried on Windows (Python 3.6) and on Mac(Python 3.8). I have ran the code on basic Python IDLE that came with installation, in PyCharm on Windows and on Jupyter Notebook and nothing happens. I have 32 bit Python. This is the code:
from bs4 import BeautifulSoup
import requests
from datetime import date, timedelta
from multiprocessing import Pool
import tqdm
headers = {'User-Agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36'}
def parse(url):
print("im in function")
response = requests.get(url[4], headers = headers)
soup = BeautifulSoup(response.text, 'html.parser')
all_skier_names = soup.find_all("div", class_ = "g-xs-10 g-sm-9 g-md-4 g-lg-4 justify-left bold align-xs-top")
all_countries = soup.find_all("span", class_ = "country__name-short")
discipline = url[0]
season = url[1]
competition = url[2]
gender = url[3]
out = []
for name, country in zip(all_skier_names , all_countries):
skier_name = name.text.strip().title()
country = country.text.strip()
out.append([discipline, season, competition, gender, country, skier_name])
return out
all_urls = [['Cross-Country', '2020', 'World Cup', 'M', 'https://www.fis-ski.com/DB/cross-country/cup-standings.html?sectorcode=CC&seasoncode=2020&cupcode=WC&disciplinecode=ALL&gendercode=M&nationcode='],
['Cross-Country', '2020', 'World Cup', 'L', 'https://www.fis-ski.com/DB/cross-country/cup-standings.html?sectorcode=CC&seasoncode=2020&cupcode=WC&disciplinecode=ALL&gendercode=L&nationcode='],
['Cross-Country', '2020', 'World Cup', 'M', 'https://www.fis-ski.com/DB/cross-country/cup-standings.html?sectorcode=CC&seasoncode=2020&cupcode=WC&disciplinecode=ALL&gendercode=M&nationcode='],
['Cross-Country', '2020', 'World Cup', 'L', 'https://www.fis-ski.com/DB/cross-country/cup-standings.html?sectorcode=CC&seasoncode=2020&cupcode=WC&disciplinecode=ALL&gendercode=L&nationcode=']]
with Pool(processes=2) as pool, tqdm.tqdm(total=len(all_urls)) as pbar:
all_data = []
print("im in pool")
for data in pool.imap_unordered(parse, all_urls):
print("im in data")
all_data.extend(data)
pbar.update()
print(all_data)
The only thing that I see when I run the code is progress bar, thats always at 0%:
0%| | 0/8 [00:00<?, ?it/s]
I set the couple of print statements in the parse(url)
function and in for loop
at the end of the code but still, only thing thats printed is "im in pool".
It seams like code does not enter the function at all, and it does not go in for loop at the end of the code.
The code should execute in 5-8 seconds, but Im waiting for 10 minutes and nothing is happening. I have also tried to do this without progress bar, but the result is the same.
Do you know whats the problem? Is it the problem with version of Python that im using (Python 3.6 32 bit) or version of some lib, IDK what to do...
来源:https://stackoverflow.com/questions/59892469/multiprocessing-for-webscrapping-wont-start-on-windows-and-mac