Trying to scrape email address from website

后端未结

关注

 2  1576

不思量自难忘° 2021-01-23 13:02

I was trying to scrape this website:

www.united-church.ca/search/locator/all?keyw=&mission_units_ucc_ministry_type_advanced=10&locll=

I did scrape it using

2条回答

再見小時候 (楼主)

2021-01-23 13:51

Using Beautiful Soup

A simple way to get the email is to look for the div with class=field-name-field-mu-email', and then replace the odd display to a proper email format.

For instance:

from bs4 import BeautifulSoup
url = 'https://www.united-church.ca/search/locator/all?keyw=&mission_units_ucc_ministry_type_advanced=10&locll='

r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')

for div in soup.findAll('div', attrs={'class': 'field-name-field-mu-email'}):
    print (div.find('span').text.replace(' [at] ', '@'))

Out[1]:
alpcharge@sasktel.net
guc-eug@bellnet.ca
pioneerpastoralcharge@gmail.com
acmeunitedchurch@gmail.com
cmcphers@lakeheadu.ca
mbm@kos.net
tommaclaren@gmail.com
agassizunited@shaw.ca
buchurch@xplornet.com
dmitchell008@yahoo.ca
karen.charlie62@gmail.com
trinityucbdn@westman.wave.ca
gepc.ucc.mail@gmail.com
monacampbell181@gmail.com
herbklaehn@gmail.com

0 讨论(0)

查看其它2个回答