问题
I've written a script in python in combination with pyppeteer
along with asyncio
to scrape the links of different posts from its landing page and eventually get the title of each post by tracking the url leading to its inner page. The content I parsed here are not dynamic ones. However, I made use of pyppeteer
and asyncio
to see how efficiently it performs asynchronously
.
The following script goes well for some moments but then enounters an error:
File "C:\Users\asyncio\tasks.py", line 526, in ensure_future
raise TypeError('An asyncio.Future, a coroutine or an awaitable is '
TypeError: An asyncio.Future, a coroutine or an awaitable is required
This is what I've wriiten so far:
import asyncio
from pyppeteer import launch
link = "https://stackoverflow.com/questions/tagged/web-scraping"
async def fetch(page,url):
await page.goto(url)
linkstorage = []
elements = await page.querySelectorAll('.summary .question-hyperlink')
for element in elements:
linkstorage.append(await page.evaluate('(element) => element.href', element))
tasks = [await browse_all_links(link, page) for link in linkstorage]
results = await asyncio.gather(*tasks)
return results
async def browse_all_links(link, page):
await page.goto(link)
title = await page.querySelectorEval('.question-hyperlink','(e => e.innerText)')
print(title)
async def main(url):
browser = await launch(headless=True,autoClose=False)
page = await browser.newPage()
await fetch(page,url)
if __name__ == '__main__':
loop = asyncio.get_event_loop()
future = asyncio.ensure_future(main(link))
loop.run_until_complete(future)
loop.close()
My question: how can I get rid of that error and do the doing asynchronously?
来源:https://stackoverflow.com/questions/53769321/scraping-content-using-pyppeteer-in-association-with-asyncio