Selenium scraping with multiple urls

前端 未结 1 1291
别那么骄傲
别那么骄傲 2021-02-06 10:02

Following my previous question, i\'m now trying to scrape multiple pages of a url (all the pages with games in a given season). I\'m also trying to scrape multiple parent urls (

相关标签:
1条回答
  • 2021-02-06 10:44

    What you need to do is:

    • reuse the same webdriver instance - do not initialize it in the loop
    • introduce Explicit Waits - this would definitely make the code more reliable and fast

    Implementation:

    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    
    import pandas as pd
    
    
    urls = [
        'http://www.oddsportal.com/hockey/austria/ebel-2014-2015/results/#/page/',
        'http://www.oddsportal.com/hockey/austria/ebel-2013-2014/results/#/page/'
    ]
    
    data = []
    
    driver = webdriver.PhantomJS()
    driver.implicitly_wait(10)
    wait = WebDriverWait(driver, 10)
    
    for url in urls:
        for page in range(1, 8):
            driver.get(url + str(page))
            # wait for the page to load
            wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div#tournamentTable tr.deactivate")))
    
            for match in driver.find_elements_by_css_selector("div#tournamentTable tr.deactivate"):
                home, away = match.find_element_by_class_name("table-participant").text.split(" - ")
                date = match.find_element_by_xpath(".//preceding::th[contains(@class, 'first2')][1]").text
    
                if " - " in date:
                    date, event = date.split(" - ")
                else:
                    event = "Not specified"
    
                data.append({
                    "home": home.strip(),
                    "away": away.strip(),
                    "date": date.strip(),
                    "event": event.strip()
                })
    
    driver.close()
    
    df = pd.DataFrame(data)
    print(df)
    

    Prints:

                       away         date          event                home
    0              Salzburg  14 Apr 2015      Play Offs     Vienna Capitals
    1       Vienna Capitals  12 Apr 2015      Play Offs            Salzburg
    2              Salzburg  10 Apr 2015      Play Offs     Vienna Capitals
    3       Vienna Capitals  07 Apr 2015      Play Offs            Salzburg
    4       Vienna Capitals  31 Mar 2015      Play Offs         Liwest Linz
    5              Salzburg  29 Mar 2015      Play Offs          Klagenfurt
    6           Liwest Linz  29 Mar 2015      Play Offs     Vienna Capitals
    7            Klagenfurt  26 Mar 2015      Play Offs            Salzburg
    8       Vienna Capitals  26 Mar 2015      Play Offs         Liwest Linz
    9           Liwest Linz  24 Mar 2015      Play Offs     Vienna Capitals
    10             Salzburg  24 Mar 2015      Play Offs          Klagenfurt
    11           Klagenfurt  22 Mar 2015      Play Offs            Salzburg
    12      Vienna Capitals  22 Mar 2015      Play Offs         Liwest Linz
    13              Bolzano  20 Mar 2015      Play Offs         Liwest Linz
    14        Fehervar AV19  18 Mar 2015      Play Offs     Vienna Capitals
    15          Liwest Linz  17 Mar 2015      Play Offs             Bolzano
    16      Vienna Capitals  16 Mar 2015      Play Offs       Fehervar AV19
    17              Villach  15 Mar 2015      Play Offs            Salzburg
    18           Klagenfurt  15 Mar 2015      Play Offs              Znojmo
    19              Bolzano  15 Mar 2015      Play Offs         Liwest Linz
    20          Liwest Linz  13 Mar 2015      Play Offs             Bolzano
    21        Fehervar AV19  13 Mar 2015      Play Offs     Vienna Capitals
    22               Znojmo  13 Mar 2015      Play Offs          Klagenfurt
    23             Salzburg  13 Mar 2015      Play Offs             Villach
    24           Klagenfurt  10 Mar 2015      Play Offs              Znojmo
    25      Vienna Capitals  10 Mar 2015      Play Offs       Fehervar AV19
    26              Bolzano  10 Mar 2015      Play Offs         Liwest Linz
    27              Villach  10 Mar 2015      Play Offs            Salzburg
    28          Liwest Linz  08 Mar 2015      Play Offs             Bolzano
    29               Znojmo  08 Mar 2015      Play Offs          Klagenfurt
    ..                  ...          ...            ...                 ...
    670       TWK Innsbruck  28 Sep 2013  Not specified              Znojmo
    671         Liwest Linz  27 Sep 2013  Not specified            Dornbirn
    672             Bolzano  27 Sep 2013  Not specified          Graz 99ers
    673          Klagenfurt  27 Sep 2013  Not specified  Olimpija Ljubljana
    674       Fehervar AV19  27 Sep 2013  Not specified            Salzburg
    675       TWK Innsbruck  27 Sep 2013  Not specified     Vienna Capitals
    676             Villach  27 Sep 2013  Not specified              Znojmo
    677            Salzburg  24 Sep 2013  Not specified  Olimpija Ljubljana
    678            Dornbirn  22 Sep 2013  Not specified       TWK Innsbruck
    679          Graz 99ers  22 Sep 2013  Not specified          Klagenfurt
    680     Vienna Capitals  22 Sep 2013  Not specified             Villach
    681       Fehervar AV19  21 Sep 2013  Not specified             Bolzano
    682            Dornbirn  20 Sep 2013  Not specified             Bolzano
    683             Villach  20 Sep 2013  Not specified          Graz 99ers
    684              Znojmo  20 Sep 2013  Not specified          Klagenfurt
    685  Olimpija Ljubljana  20 Sep 2013  Not specified         Liwest Linz
    686       Fehervar AV19  20 Sep 2013  Not specified       TWK Innsbruck
    687            Salzburg  20 Sep 2013  Not specified     Vienna Capitals
    688             Villach  15 Sep 2013  Not specified          Klagenfurt
    689         Liwest Linz  15 Sep 2013  Not specified            Dornbirn
    690     Vienna Capitals  15 Sep 2013  Not specified       Fehervar AV19
    691       TWK Innsbruck  15 Sep 2013  Not specified            Salzburg
    692          Graz 99ers  15 Sep 2013  Not specified              Znojmo
    693  Olimpija Ljubljana  14 Sep 2013  Not specified            Dornbirn
    694             Bolzano  14 Sep 2013  Not specified       Fehervar AV19
    695          Klagenfurt  13 Sep 2013  Not specified          Graz 99ers
    696              Znojmo  13 Sep 2013  Not specified            Salzburg
    697  Olimpija Ljubljana  13 Sep 2013  Not specified       TWK Innsbruck
    698             Bolzano  13 Sep 2013  Not specified     Vienna Capitals
    699         Liwest Linz  13 Sep 2013  Not specified             Villach
    
    [700 rows x 4 columns]
    
    0 讨论(0)
提交回复
热议问题