How to fix a page loop in python during webscrapping?

后端 未结 1 1732
被撕碎了的回忆
被撕碎了的回忆 2021-01-24 21:57

I am trying to loop through each page but once it gets to the end of the pages it just skips over the needed lines. The pages vary by by link. So I need a dynamic solution for t

1条回答
  •  [愿得一人]
    2021-01-24 22:52

    You can get the desired output by replacing the loop as below.

    for link in allyearslink:
        driver.get(link)
        url = driver.current_url
        print(url)
        # click on the last page button
        driver.find_element_by_xpath("(//div[@id='pagination']//span)[last()]").click()
        time.sleep(3) # we can handle this better
        max_page = int(driver.find_element_by_class_name('active-page').text)
    
        ##################### This is where I believe my problem is at ######################
        for j in range(1, max_page + 1):
            current_page = url + '#/page/' + str(j)
            driver.get(current_page)
    
            for i in range(3):
                allelements = WebDriverWait(driver, 15).until(EC.visibility_of_all_elements_located(
                    (By.CSS_SELECTOR, "td.name.table-participant >a[href^='/basketball/europe/euroleague']")))
                print(allelements[i].text)
    
                scores.append(allelements[i].text)
                games.append(allelements[i].text)
    
                driver.execute_script("arguments[0].click();", allelements[i])
    
                time.sleep(2)
                elem1 = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.LINK_TEXT, "AH"))).click()
                time.sleep(2)
                # .date
                date_ofGame = WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.CSS_SELECTOR, ".date")))
                print(date_ofGame.text)
                elem2 = driver.find_element_by_id("odds-data-table")
                scores.append(date_ofGame.text)
                scores.append(elem2.text)
                driver.back()
                time.sleep(2)
                driver.back()
    

    Reason why you are getting error was because of / at the end of the td.name.table-participant >a[href^='/basketball/europe/euroleague/'].

    Here is the sample output:

    0 讨论(0)
提交回复
热议问题