问题
I want to extract the total page number (11 in this case) from a steam page. I believe that the following code should work (return 11), but it is returning an empty list. Like if it is not finding paged_items_paging_pagelink
class.
import requests
import re
from bs4 import BeautifulSoup
r = requests.get('http://store.steampowered.com/tags/en-us/RPG/')
c = r.content
soup = BeautifulSoup(c, 'html.parser')
total_pages = soup.find_all("span",{"class":"paged_items_paging_pagelink"})[-1].text
回答1:
If you check the page source, the content you want is not available. It means that it is generated dynamically through Javascript.
The page numbers are located inside the <span id="NewReleases_links">
tag, but in the page source the HTML shows only this:
<span id="NewReleases_links"></span>
Easiest way to handle this is using Selenium.
But, if you look at the page source, the text Showing 1-20 of 213 results
is available. So, you can scrape this and calculate the number of pages.
Required HTML:
<div class="paged_items_paging_summary ellipsis">
Showing
<span id="NewReleases_start">1</span>
-
<span id="NewReleases_end">20</span>
of
<span id="NewReleases_total">213</span>
results
</div>
Code:
import requests
from bs4 import BeautifulSoup
r = requests.get('http://store.steampowered.com/tags/en-us/RPG/')
soup = BeautifulSoup(r.text, 'lxml')
def get_pages_no(soup):
total_items = int(soup.find('span', id='NewReleases_total').text)
items_per_page = int(soup.find('span', id='NewReleases_end').text)
return round(total_items/items_per_page)
print(get_pages_no(soup))
# prints 11
(Note: I still recommend the use of Selenium, as most of the content from this site is dynamically generated. It'll be a pain to scrape all the data like this.)
回答2:
An alternative faster way without using BeautifulSoup
:
import requests
url = "http://store.steampowered.com/contenthub/querypaginated/tags/NewReleases/render/?query=&start=20&count=20&cc=US&l=english&no_violence=0&no_sex=0&v=4&tag=RPG" # This returns your query in json format
r = requests.get(url)
print(round(r.json()['total_count'] / 20)) # total_count = number of records, 20 = number of pages shown
11
来源:https://stackoverflow.com/questions/49035188/finding-number-of-pages-using-python-beautifulsoup