问题
Ok guys, so I'm very much a beginner here. The purpose of what I'm trying to do is to scrape a website for company names and corresponding phone numbers. The end goal would be to write these to a CSV that can be opened with Excel.
Currently I'm able to retrieve the company names, and the phone numbers, separately. I am thinking that i could merge the two lists somehow, but I'm concerned about a single outlier data offsetting the whole merge, and mismatching the numbers to names.
What is the best way to accomplish this?
from urllib import request
from bs4 import BeautifulSoup
url = 'https://www.iqsdirectory.com/bolts/bolts-2/'
html = request.urlopen(url)
soup = BeautifulSoup(html, 'html.parser')
data1 = soup.findAll('span', {'itemprop':'name'})
data2 = soup.findAll('a', {'itemprop':'telephone'})
datalist1 = []
datalist2 = []
for i in data1:
datalist1.append(i.string)
for i in data2:
datalist2.append(i.string)
x = zip(datalist1, datalist2)
print(list(x))
Is it possible to pull name and phone in the same soup function in order to preserve their connection?
Any help would be appreciated!
回答1:
import requests
from bs4 import BeautifulSoup
import csv
def main(url):
r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')
target = soup.select("h3.cname")
with open("data.csv", 'w', newline="") as f:
writer = csv.writer(f)
writer.writerow(["Name", "Phone"])
for tar in target:
name = tar.find("span", itemprop="name").text
phone = tar.find("a", itemprop="telephone").text
writer.writerow([name, phone])
main("https://www.iqsdirectory.com/bolts/bolts-2/")
Output: view-online
回答2:
Here is a solution that fits your needs. If a name or number does not exist, it will not be represented in that list. There is probably a correct exception to catch but I don't know the correct name off the top of my head.
The idea is as I explained in my comment. I get a list of the headers. For each header, I try to find the name and number. If I can't find it, I catch the exception. If I can find it, I append it to a company. And then for each company, I append it to companies. Our result is a list of companies, where each company is a list containing a name and a number.
from urllib import request
from bs4 import BeautifulSoup
url = 'https://www.iqsdirectory.com/bolts/bolts-2/'
html = request.urlopen(url)
soup = BeautifulSoup(html, 'html.parser')
headers = soup.findAll('h3', {"class": 'cname'})
companies = []
for header in headers:
company = []
try:
company.append(header.find('span', {'itemprop':'name'}).text)
except Error as e:
print(e)
pass
try:
company.append(header.find('a', {'itemprop':'telephone'}).text)
except Error as e:
print(e)
pass
companies.append(company)
print(companies)
Your result is:
[['A & J Fastener Corp.', '877-563-2658'], ['AA Anchor Bolt, Inc.', '800-929-3845'], ['Abbott-Interfast Corporation', '800-877-0789'], ['Accurate Manufactured Products Group, Inc.', '317-472-9000'], ['ACF Components & Fasteners, Inc.', '800-824-5449'], ['Aerospace Manufacturing Corporation', '973-472-2300'], ['Aetna Screw Products Co.', '847-647-9555'], ['AFT Fasteners', '877-844-8595'], ['AJ Fasteners Inc.', '714-630-1556'], ['All-Ways Fasteners, Inc.', '800-870-0372'], ['Amco Enterprises', '866-651-2626'], ['American Bolt Corp.', '262-786-6530'], ['Anchor Bolt & Screw Company', '847-841-7000'], ['Anchor Bolt Source', '888-812-6587'], ['Ancrabec', '888-649-7203'], ['Armour Screw Company', '800-726-4563'], ['Aspen Fasteners', '800-479-0056'], ['Assembly Products, Inc.', '608-296-1666'], ['Associated Fastening Products, Inc.', '888-696-0709'], ['Atwood Industries', '800-362-2059'], ['B&G Manufacturing', '800-366-3067'], ['Baco Enterprises, Inc.', '800-622-2226'], ['Barnhill Bolt Co., Inc.', '800-472-3900'], ['Birmingham Fastener Manufacturing', '800-695-3511'], ['Blue Ribbon Fastener Co.', '847-673-1248'], ['BMB Fasteners, Inc.', '973-256-4010'], ['Bolt Products, Inc.', '800-423-6503'], ['Bossard North America, Inc.', '800-772-2738'], ['Bowie Bolt & Supply, Inc.', '800-337-9650'], ['British Metrics', '800-762-5134'], ['Brunner Manufacturing Co., Inc.', '608-847-6667'], ['Buckeye Fasteners, Inc.', '800-437-1689'], ['C&L Rivet Company, Inc.', '215-672-1113'], ['Cal-Fasteners, Inc.', '714-854-1715'], ['California Bolt Co.', '714-957-6000'], ['Champion Bolt & Supply', '425-339-2632'], ['Chicago Hardware & Fixture Company', '847-455-6609'], ['Chicago Nut & Bolt', '888-529-8600'], ['Circle Bolt & Nut Co., Inc.', '800-548-2658'], ['Coburn-Myers Fastening Systems Incorporated', '800-662-7459'], ['Connor Fastener', '478-742-7261'], ['Cordova Bolt, Inc.', '800-421-3435'], ['DAN-LOC Bolt & Gasket', '800-231-6355'], ['Dayton Nut & Bolt Co., Inc.', '888-711-2658'], ['Deco Manufacturing Company', '800-637-5861'], ['Delta Fastener Corp.', '800-670-5938'], ['Diamond Fasteners', '877-729-6283'], ['Dyson Corporation', '800-680-3600'], ['E & T Fasteners', '800-650-4707'], ['East Coast Metals, Inc.', '800-355-2060'], ['Eastwood Manufacturing', '281-447-0081'], ['EBC Industries', '814-456-4287'], ['Elgin Equipment Group', '630-434-7200'], ['Elgin Fastener Group', '812-689-8990'], ['Engineered Components Company', '847-841-7000'], ['EPS Engineered Parts Sourcing Inc.', '877-889-1017'], ['Falcon Fastening Solutions', '502-266-6292'], ['FASCO, Inc.', '708-371-0747'], ['Fast-Rite International, Inc.', '888-327-8077'], ['Fastenal Company', '507-454-5374'], ['Fastener Dimensions, Inc.', '800-969-2188'], ['Fastener Solutions, Inc.', '866-463-2910'], ['Fastener SuperStore, Inc.', '866-688-2500'], ['Fastener Tool & Supply, Inc.', '800-662-9232'], ['Fasteners Plus International', '708-479-5558'], ['Fasteners Unlimited, Inc.', '724-776-7273'], ['Fastening Products of Lancaster, Inc.', '717-299-5771'], ['FM Stainless Fasteners', '800-749-1115'], ['Genesis Bolt & Supply', '866-276-1399'], ['Global Certified Fastener', '708-450-9301'], ['Global Fastener & Supply, Inc.', '800-785-2664'], ['Guidon Corporation', '856-866-8808'], ['Haydon Bolts, Inc.', '215-537-8700'], ['Hayes Bolt & Supply', '619-231-5966'], ['HC Pacific', '909-598-0509'], ['Hercules Fasteners', '800-332-7320'], ['Hudson Fasteners, Inc.', '877-427-2739'], ['Hydra-Dynamics, Inc.', '936-273-2882'], ['Infinity Fasteners', '913-438-2252'], ['IntegraTECH Distribution', '603-880-3760'], ['J.P. Ruklic Screw Company', '708-339-3600'], ['K-T Bolt Manufacturing, Inc.', '800-553-4521'], ['KelKo Products Company', '800-346-7883'], ['Kinter', '800-323-2389'], ['Lamons Fastener Division', '713-673-5376'], ['Lamons Gasket Company', '800-231-6906'], ['Larson Hardware Manufacturing Company', '815-625-0503'], ['Lincoln Structural Solutions', '402-952-4400'], ['Master Bolt Manufacturing, Inc.', '888-905-2658'], ['Melfast, Inc', '973-227-0045'], ['Micro Plastics, Inc', '(870)453-2261'], ['Mid-States Bolt & Screw Co.', '800-482-0867'], ['Mutual Screw & Supply', '800-222-0324'], ['National Bolt & Nut Corporation', '630-307-8800'], ['Nickel Systems, Inc.', '215-855-5633'], ['Nord-Lock / Superbolt®, Inc.', '412-279-1149'], ['Norwood Screw Machine Parts', '800-437-6644'], ['Nova Fasteners Co. Inc.', '877-541-7222'], ['O.E.M. Fastening Systems', '800-928-7439'], ['O.E.M. Hardware', '800-663-6554'], ['Ocean State Stainless, Inc.', '800-394-6396'], ['Palmer Bolt & Supply Co.', '(937)778-9606'], ['Parker Fasteners', '623-925-5998'], ['PennEngineering®', '800-342-5736'], ['Pohl Spring Works, Inc.', '800-777-1284'], ['Product Components Corporation', '800-336-0406'], ['Production Materials Inc.', '224-434-2290'], ['R&R Engineering Company Inc.', '800-979-1921'], ['Reco Industries', '636-639-6010'], ['Remco Bolt', '800-460-3327'], ['ROBNET', '410-247-7273'], ['SASCO Fasteners', '800-779-2024'], ['SC Fastening Systems, LLC.', '330-468-3300'], ['Screw Products International', '800-876-5153'], ['Secure Fastener & Tool Company', '201-939-4422'], ['Specialty Bolt & Screw, Inc.', '413-789-6700'], ['Specialty Screw Corporation', '815-969-4100'], ['St. Louis Screw & Bolt', '800-237-7059'], ['Stalcop', '765-436-7926'], ['Stanley Industries Inc.', '800-253-2658'], ['Stelfast® Inc.', '800-729-9779'], ['Suncor Stainless, Inc.', '800-394-2222'], ['Sunny Screw Industry Co. Ltd.', '770-351-2858'], ['Tanner Bolt & Nut Corp.', '800-456-2658'], ['Tengco', '714-676-8200'], ['The Federal Group', '800-759-2658'], ['Tripac', '951-280-4488'], ['TSA Manufacturing', '800-228-2948'], ['United Titanium, Inc.', '844-321-4684'], ['USP Aerospace Solutions, Inc.', '631-287-6321'], ['Valtra, Inc.', '800-989-5244'], ['Wayne Bolt & Nut Company', '800-521-2207'], ['WINK Fasteners, Inc.', '804-966-8111'], ['Wodin, Inc.', '440-439-4222'], ['Wurth Industry', '800-428-4686'], ['Yangtze Railroad Materials', '855-889-2648']]
来源:https://stackoverflow.com/questions/61258326/python-beautifulsoup-scraping-how-to-combine-two-different-fields-or-pair-them