问题
Use urllib to read the HTML from the data files below, extract the href= vaues from the anchor tags, scan for a tag that is in a particular position relative to the first name in the list, follow that link and repeat the process a number of times and report the last name you find.
This is HTML link for data http://py4e-data.dr-chuck.net/known_by_Caragh.html
So I have to find the link at position 18 (the first name is 1). Follow that link. Repeat this process 7 times. The answer is the last name that you retrieve.
- Can someone explain me line by line in detail how these 2 loops work("While", and "for").
- So when I enter positi 18 is it extracts 18th line of href tag and then next 18th so on 7 times ? Because even if I Enter different number I'm still getting same answer. Thank you so much in advance.
Code:
import urllib.request, urllib.parse, urllib.error
from bs4 import BeautifulSoup
import ssl
n = 0
count = 0
url = input("Enter URL:")
numbers = input("Enter count:")
position = input("Enter position:")
while n < 7:
html = urllib.request.urlopen(url).read()
soup = BeautifulSoup(html, 'html.parser')
tags = soup('a')
for tag in tags:
count = count + 1
if count == 18:
url = tag.get('href', None)
print("Retrieving:" , url)
count = 0
break
n = n + 1
回答1:
Because even if I Enter different number I'm still getting same answer.
You're getting the same answer because you've hard coded that in with:
while n < 7
and
if count == 18
I think you've meant to have those as your variable/input. With that, you'll also need those inputs as an int
, as currently, they get stored as as str
. Also just note, I didn't want to type in the url each time, so hard coded that, but you can uncomment your input there, and then comment out the url = 'http://py4e-data.dr-chuck.net/known_by_Caragh.html'
import urllib.request, urllib.parse, urllib.error
from bs4 import BeautifulSoup
import ssl
n = 0
count = 0
url = 'http://py4e-data.dr-chuck.net/known_by_Caragh.html'
#url = input("Enter URL:")
numbers = int(input("Enter count:"))
position = int(input("Enter position:"))
while n < numbers: #<----- there's your variable of how many times to try
html = urllib.request.urlopen(url).read()
soup = BeautifulSoup(html, 'html.parser')
tags = soup('a')
for tag in tags:
count = count + 1
if count == position: #<------- and the variable to get the position
url = tag.get('href', None)
print("Retrieving:" , url)
count = 0
break
n = n + 1 #<---- I fixed your indentation. The way it was previously would never get yourself out of the while loop because n will never increment.
来源:https://stackoverflow.com/questions/54900251/can-someone-explain-me-in-detail-how-this-code-works-regarding-using-python-to