Python: Parse from list only prints last item, not all?

烂漫一生 提交于 2021-01-29 02:05:33

问题


My code:

from urllib2 import urlopen
from bs4 import BeautifulSoup

url = "https://realpython.com/practice/profiles.html"

html_page = urlopen(url)
html_text = html_page.read()

soup = BeautifulSoup(html_text)

links = soup.find_all('a', href = True)

files = []
base = "https://realpython.com/practice/"


def page_names():
    for a in links:
        files.append(base + a['href'])

page_names()

for i in files:
    all_page = urlopen(i)

all_text = all_page.read()
all_soup = BeautifulSoup(all_text)
print all_soup

The first half of the parsing collects three links, the second half is supposed to print out all of their html.

Sadly, it only prints the last link's html.

Possibly because of

for i in files:
    all_page = urlopen(i)

It was working previously with 8 lines of code serving the for i in files: purpose but I wanted to clean it up and got it down to those two. Well, clearly not because it doesn't work.

No error though!


回答1:


You only store the last value in your loop, you need to move all the assignments and the print inside the loop:

for i in files:
    all_page = urlopen(i)
    all_text = all_page.read()
    all_soup = BeautifulSoup(all_text)
    print all_soup

If you are going to use functions I would pass parameters and create the list otherwise you might get unexpected output:

def page_names(b,lnks):
    files = []
    for a in lnks:
        files.append(b + a['href'])
    return files


for i in page_names(base,links):
    all_page = urlopen(i)
    all_text = all_page.read()
    all_soup = BeautifulSoup(all_text)
    print all_s

Your function can then return a list comprehension:

def page_names(b,lnks):
    return [b + a['href'] for a in lnks]



回答2:


In your for loop you are assinging to all_page, which will overwrite it on each loop through, so it will only ever have the value of the last iteration.

If you want it to print the all_soup for each page you could just indent those 3 lines to be inside the for loop as well, then they would be executed each time through the loop.




回答3:


It seems to be jsut a formatting issue, you probably meant to print it in the loop, right?

for i in files:
    all_page = urlopen(i)
    all_text = all_page.read()
    all_soup = BeautifulSoup(all_text)
    print all_soup


来源:https://stackoverflow.com/questions/29733903/python-parse-from-list-only-prints-last-item-not-all

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!