Trying to scrape email address from website

后端 未结 2 1572
不思量自难忘°
不思量自难忘° 2021-01-23 13:02

I was trying to scrape this website:

www.united-church.ca/search/locator/all?keyw=&mission_units_ucc_ministry_type_advanced=10&locll=

I did scrape it using

相关标签:
2条回答
  • 2021-01-23 13:51

    Using Beautiful Soup

    A simple way to get the email is to look for the div with class=field-name-field-mu-email', and then replace the odd display to a proper email format.

    For instance:

    from bs4 import BeautifulSoup
    url = 'https://www.united-church.ca/search/locator/all?keyw=&mission_units_ucc_ministry_type_advanced=10&locll='
    
    r = requests.get(url)
    soup = BeautifulSoup(r.text, 'html.parser')
    
    for div in soup.findAll('div', attrs={'class': 'field-name-field-mu-email'}):
        print (div.find('span').text.replace(' [at] ', '@'))
    
    Out[1]:
    alpcharge@sasktel.net
    guc-eug@bellnet.ca
    pioneerpastoralcharge@gmail.com
    acmeunitedchurch@gmail.com
    cmcphers@lakeheadu.ca
    mbm@kos.net
    tommaclaren@gmail.com
    agassizunited@shaw.ca
    buchurch@xplornet.com
    dmitchell008@yahoo.ca
    karen.charlie62@gmail.com
    trinityucbdn@westman.wave.ca
    gepc.ucc.mail@gmail.com
    monacampbell181@gmail.com
    herbklaehn@gmail.com
    
    
    0 讨论(0)
  • You can try webscraping using Selenium, I tried this code and its giving perfect results.

    from selenium import webdriver
    from bs4 import BeautifulSoup
    
    
    driver = webdriver.Chrome("chromedriver")
    driver.get("https://www.united-church.ca/search/locator/all?keyw=&mission_units_ucc_ministry_type_advanced=10&locll=")
    
    content = driver.page_source
    soup = BeautifulSoup(content)
    
    for all_emails in soup.find_all('a',class_="spamspan"):
        print(all_emails.text)
    

    Results:

    alpcharge@sasktel.net
    guc-eug@bellnet.ca
    pioneerpastoralcharge@gmail.com
    acmeunitedchurch@gmail.com
    cmcphers@lakeheadu.ca
    mbm@kos.net
    tommaclaren@gmail.com
    agassizunited@shaw.ca
    buchurch@xplornet.com
    dmitchell008@yahoo.ca
    karen.charlie62@gmail.com
    trinityucbdn@westman.wave.ca
    gepc.ucc.mail@gmail.com
    monacampbell181@gmail.com
    herbklaehn@gmail.com
    
    0 讨论(0)
提交回复
热议问题