问题
My code:
from urllib2 import urlopen
from bs4 import BeautifulSoup
url = "https://realpython.com/practice/profiles.html"
html_page = urlopen(url)
html_text = html_page.read()
soup = BeautifulSoup(html_text)
links = soup.find_all('a', href = True)
files = []
base = "https://realpython.com/practice/"
def page_names():
for a in links:
files.append(base + a['href'])
page_names()
for i in files:
all_page = urlopen(i)
all_text = all_page.read()
all_soup = BeautifulSoup(all_text)
print all_soup
The first half of the parsing collects three links, the second half is supposed to print out all of their html.
Sadly, it only prints the last link's html.
Possibly because of
for i in files:
all_page = urlopen(i)
It was working previously with 8 lines of code serving the for i in files: purpose but I wanted to clean it up and got it down to those two. Well, clearly not because it doesn't work.
No error though!
回答1:
You only store the last value in your loop, you need to move all the assignments and the print inside the loop:
for i in files:
all_page = urlopen(i)
all_text = all_page.read()
all_soup = BeautifulSoup(all_text)
print all_soup
If you are going to use functions I would pass parameters and create the list otherwise you might get unexpected output:
def page_names(b,lnks):
files = []
for a in lnks:
files.append(b + a['href'])
return files
for i in page_names(base,links):
all_page = urlopen(i)
all_text = all_page.read()
all_soup = BeautifulSoup(all_text)
print all_s
Your function can then return a list comprehension:
def page_names(b,lnks):
return [b + a['href'] for a in lnks]
回答2:
In your for loop you are assinging to all_page, which will overwrite it on each loop through, so it will only ever have the value of the last iteration.
If you want it to print the all_soup for each page you could just indent those 3 lines to be inside the for loop as well, then they would be executed each time through the loop.
回答3:
It seems to be jsut a formatting issue, you probably meant to print it in the loop, right?
for i in files:
all_page = urlopen(i)
all_text = all_page.read()
all_soup = BeautifulSoup(all_text)
print all_soup
来源:https://stackoverflow.com/questions/29733903/python-parse-from-list-only-prints-last-item-not-all