问题
#import libraries
import requests
from bs4 import BeautifulSoup
links = set()
#"skeleton" of url
base_url = 'https://steamcommunity.com/market/search?appid=730&q=#p{}'
#site has 1300 pages, and i want to parse all of them
count = 1301
for i in range(count):
url = base_url.format(i)
#send get request to url
request = requests.get(url)
#print i
print(f"Extracting Page#: {i}")
#process the request using bs4
soup = BeautifulSoup(request.content, 'html5lib')
urlparse = soup.find_all('div', attrs={'id': 'searchResultsRows'})
for parseuk in urlparse:
#print hrefs, that i need
hrefUK = parseuk.find_all('a', attrs={'class': 'market_listing_row_link'})
for a in hrefUK:
z = a["href"]
print("var z = ", z)
If launched, it will only show links from first page. "i" is changing, but this code parsing only first page. Why? This will repeat 1300 times.
回答1:
I am not exactly sure what you are asking, and do not think below is necessarily an answer, but it might clean things up a bit. There is no need for your first for loop in s_p
def s_p():
base_url = 'https://steamcommunity.com/market/search?appid=730&q=#p{}'
count = 1301
for i in range(counts):
url = base_url.format(i)
request = session.get(url)
soup = BeautifulSoup(request.content, 'html5lib')
urlparse = soup.find_all('div', attrs={'id': 'searchResultsRows'})
for parseuk in urlparse:
hrefUK = parseuk.find_all('a', attrs={'class': 'market_listing_row_link'})
for a in hrefUK:
z = a["href"]
print("var z = ", z)
回答2:
I don't know what you are trying to do. Your code include much mistakes, As far as i understood from your code that you want to iterate over pages and collect the href
links.
loop over using q=0#p{i}_popular_desc
import requests
from bs4 import BeautifulSoup
links = set()
for i in range(1, 10):
print(f"Extracting Page#: {i}")
r = requests.get(
f"https://steamcommunity.com/market/search?appid=730&q=0#p{i}_popular_desc")
soup = BeautifulSoup(r.text, 'html.parser')
for item in soup.findAll('a', attrs={'class': 'market_listing_row_link'}):
links.add(item.get('href'))
for item in links:
print(item)
Or use API
directly from here:
https://steamcommunity.com/market/search/render/?query=&start=0&count=10&search_descriptions=0&sort_column=popular&sort_dir=desc&appid=730
So you will not get blocked
or need to use tor
or keep changing user-agent
来源:https://stackoverflow.com/questions/59163644/why-cycle-repeats-and-doesnt-change-variable