Python: Parse from list only prints last item, not all?

问题

My code:

from urllib2 import urlopen
from bs4 import BeautifulSoup

url = "https://realpython.com/practice/profiles.html"

html_page = urlopen(url)
html_text = html_page.read()

soup = BeautifulSoup(html_text)

links = soup.find_all('a', href = True)

files = []
base = "https://realpython.com/practice/"


def page_names():
    for a in links:
        files.append(base + a['href'])

page_names()

for i in files:
    all_page = urlopen(i)

all_text = all_page.read()
all_soup = BeautifulSoup(all_text)
print all_soup

The first half of the parsing collects three links, the second half is supposed to print out all of their html.

Sadly, it only prints the last link's html.

Possibly because of

for i in files:
    all_page = urlopen(i)

It was working previously with 8 lines of code serving the for i in files: purpose but I wanted to clean it up and got it down to those two. Well, clearly not because it doesn't work.

No error though!

回答1:

You only store the last value in your loop, you need to move all the assignments and the print inside the loop:

for i in files:
    all_page = urlopen(i)
    all_text = all_page.read()
    all_soup = BeautifulSoup(all_text)
    print all_soup

If you are going to use functions I would pass parameters and create the list otherwise you might get unexpected output:

def page_names(b,lnks):
    files = []
    for a in lnks:
        files.append(b + a['href'])
    return files


for i in page_names(base,links):
    all_page = urlopen(i)
    all_text = all_page.read()
    all_soup = BeautifulSoup(all_text)
    print all_s

Your function can then return a list comprehension:

def page_names(b,lnks):
    return [b + a['href'] for a in lnks]

回答2:

In your for loop you are assinging to all_page, which will overwrite it on each loop through, so it will only ever have the value of the last iteration.

If you want it to print the all_soup for each page you could just indent those 3 lines to be inside the for loop as well, then they would be executed each time through the loop.

回答3:

It seems to be jsut a formatting issue, you probably meant to print it in the loop, right?

for i in files:
    all_page = urlopen(i)
    all_text = all_page.read()
    all_soup = BeautifulSoup(all_text)
    print all_soup

来源：https://stackoverflow.com/questions/29733903/python-parse-from-list-only-prints-last-item-not-all

标签

python

parsing

for-loop

printing

beautifulsoup