Can someone explain me in detail how this code works regarding (Using Python to Access Web Data)

十年热恋 提交于 2020-12-06 16:10:44

问题


Use urllib to read the HTML from the data files below, extract the href= vaues from the anchor tags, scan for a tag that is in a particular position relative to the first name in the list, follow that link and repeat the process a number of times and report the last name you find.

This is HTML link for data http://py4e-data.dr-chuck.net/known_by_Caragh.html

So I have to find the link at position 18 (the first name is 1). Follow that link. Repeat this process 7 times. The answer is the last name that you retrieve.

  1. Can someone explain me line by line in detail how these 2 loops work("While", and "for").
  2. So when I enter positi 18 is it extracts 18th line of href tag and then next 18th so on 7 times ? Because even if I Enter different number I'm still getting same answer. Thank you so much in advance.

Code:

import urllib.request, urllib.parse, urllib.error
from bs4 import BeautifulSoup
import ssl
n = 0
count = 0
url = input("Enter URL:")
numbers  = input("Enter count:")
position = input("Enter position:")

while n < 7:
    html = urllib.request.urlopen(url).read()
    soup = BeautifulSoup(html, 'html.parser')
    tags = soup('a')
    for tag in tags:
      count = count + 1
      if count == 18:
         url  = tag.get('href', None)
         print("Retrieving:" , url)
         count = 0
         break
n = n + 1

回答1:


Because even if I Enter different number I'm still getting same answer.

You're getting the same answer because you've hard coded that in with:

while n < 7

and

if count == 18

I think you've meant to have those as your variable/input. With that, you'll also need those inputs as an int, as currently, they get stored as as str. Also just note, I didn't want to type in the url each time, so hard coded that, but you can uncomment your input there, and then comment out the url = 'http://py4e-data.dr-chuck.net/known_by_Caragh.html'

import urllib.request, urllib.parse, urllib.error
from bs4 import BeautifulSoup
import ssl

n = 0
count = 0

url = 'http://py4e-data.dr-chuck.net/known_by_Caragh.html'
#url = input("Enter URL:")

numbers  = int(input("Enter count:"))
position = int(input("Enter position:"))

while n < numbers:    #<----- there's your variable of how many times to try
    html = urllib.request.urlopen(url).read()
    soup = BeautifulSoup(html, 'html.parser')
    tags = soup('a')
    for tag in tags:
      count = count + 1
      if count == position:  #<------- and the variable to get the position
         url  = tag.get('href', None)
         print("Retrieving:" , url)
         count = 0
         break
    n = n + 1    #<---- I fixed your indentation. The way it was previously would never get yourself out of the while loop because n will never increment.


来源:https://stackoverflow.com/questions/54900251/can-someone-explain-me-in-detail-how-this-code-works-regarding-using-python-to

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!