问题
I'm trying to do web scraping as my first project using python (completely new to programming), I'm almost done, however some values on the web page are missing, so I want to replace that missing value with something like a "0" or "Not found", really I just want to make a csv file out of the data, not really going forward with the analysis.
The web page I'm scraping is: https://www.lamudi.com.mx/nuevo-leon/departamento/for-rent/?page=1
I have a loop that collects all of te links of the page, and then goes to each one of them to scrap the data and save it on a list, however some of my lists have less elements than others. So I just want my program to identify when is a missing value and append a "0" or "Not found" to my "sizes" list.
For collecting the links on the page:
tags = soup('a',{'class':'js-listing-link'})
for tag in tags:
link = tag.get('href')
if link not in links:
links.append(link)
print("Number of Links:", len(links))
For collecting the size of each department:
for link in links:
size = soup('span',{'class':'Overview-attribute icon-livingsize-v4'})
for mysize in size:
mysize = mysize.get_text().strip()
sizes.append(mysize)
print("Number of Sizes:", len(sizes))
回答1:
On this page, you can select all listing rows (with .select('.ListingCell-row')
) and then select all information within it (and substituting the missing info with -
):
import requests
from bs4 import BeautifulSoup
url = 'https://www.lamudi.com.mx/nuevo-leon/departamento/for-rent/?page=1'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:81.0) Gecko/20100101 Firefox/81.0'}
soup = BeautifulSoup(requests.get(url, headers=headers).content, 'html.parser')
for row in soup.select('.ListingCell-row'):
name = row.h3.get_text(strip=True)
link = row.h3.a['href']
size = row.select_one('.icon-livingsize')
size = size.get_text(strip=True) if size else '-'
print(name)
print(link)
print(size)
print('-' * 80)
Prints:
Loft en Renta Amueblado Una Recámara Cerca Udem
https://www.lamudi.com.mx/loft-en-renta-amueblado-una-recamara-cerca-udem.html
50 m²
--------------------------------------------------------------------------------
DEPARTAMENTO EN RENTA SAN JERONIMO EQUIPADO
https://www.lamudi.com.mx/departamento-en-renta-san-jeronimo-equipado.html
-
--------------------------------------------------------------------------------
Departamento - Narvarte
https://www.lamudi.com.mx/departamento-narvarte-58.html
60 m²
--------------------------------------------------------------------------------
...and so on.
来源:https://stackoverflow.com/questions/64296201/missing-values-while-scraping-using-beautifulsoup-in-python